FriendFeed's Pseudo-ODB

Object database technology has never really taken off in the industry, but it has a lot of valid purposes, and in many scenarios can be very quick and powerful for developers to work with.

Relational databases carry a certain aura of reliability and reusability that make them appealing for businesses. Buying a pre-packaged product (whether it is a database, a reporting engine, an application server, or whatever) has a certain appeal to most businesses; the warm-blanket effect.

Also, there is no ‘object’ database storage standard, so you can’t just slap a pre-packaged reporting framework on top of any object database without having to do some leg-work. This has been the #1 argument I’ve continually heard as to why we needed to use SQL storage formats at every company I have worked. Ironically, the table models become complex and have enough quirks in them that writing scalable queries against them take hours and hours of developer time anyway.

Technology-oriented businesses (especially smaller ones) often have more leverage to ‘tinker’, especially when they are fighting the battle of scalability vs. cost.

While it’s not the first article of it’s kind, I was heartened to see this article over at Friendfeed where they talk about their transition to an object-like storage mechanism (what they call schema-less).

Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. We can change the “schema” simply by storing new properties.

We index data in these entities by storing indexes in separate MySQL tables. If we want to index three properties in each entity, we will have three MySQL tables - one for each index. If we want to stop using an index, we stop writing to that table from our code and, optionally, drop the table from MySQL. If we want a new index, we make a new MySQL table for that index and run a process to asynchronously populate the index without disrupting our live service.

They mention that the primary reason for moving to this style of storage was the cost of doing relational database changes in MySQL (adding/removing indexes most notably) was just becoming too high, and was causing too many production performance issues.

Their format rings similarly to those chosen by Google BigTable and Amazon SimpleDB which both use a ‘schema-less’ property-based storage mechanism.