Saturday, March 28, 2009

Distributed Databases and Cassandra at PyCon

I'll be leading an open-spaces discussion about distributed database architecture, implementation, and use today at 5:00 PM in the Lambert room. Specifically, we will cover bigtable, dynamo, and cassandra, and how to port a typical relational schema to cassandra's ColumnFamily model.

I wrote a little background information yesterday about why I think Cassandra in particular is compelling.

Friday, March 27, 2009

Why I like the Cassandra distributed database

I need a distributed database. A real distributed database; replication doesn't count because under a replication-oriented db, each node still needs to be able to handle the full write volume, and you can only throw hardware at that for so long.

So, I'm working on the Cassandra distributed database. I gave a lightning talk on it at PyCon this morning. Cassandra is written in Java and implements a sort of hybrid between Dynamo and Bigtable. (Both papers are worth reading.) It takes its distribution algorithm from Dynamo and its data model from Bigtable -- sort of the best of both worlds. Avinash Lakshman, Cassandra's architect, is one of the authors of the Dynamo paper.

There is a video about Cassandra here. The first 1/4 is about using Cassandra and then the rest is mostly about the internals.

Cassandra is very bleeding edge. Facebook runs several Cassandra clusters in production (the largest is 120 machines and 40TB of data), but there are sharp edges that will cut you. If you want something that Just Works out of the box Cassandra is a poor fit right now and will be for several months.

Cassandra was open-sourced by Facebook last summer. There was some initial buzz but the facebook developers had trouble dealing with the community and the project looked moribund -- FB was doing development against an internal repository and throwing code over the wall every few months which is no way to run an OSS project. Now the code is being developed in the open on Apache and I was voted in as a committer so things are starting to move again.

There are other distributed databases that are worth considering. Here is why those don't fit my needs (and I am not saying that these are bad choices if your requirements are different, just that they don't work for me):


  • Follows the bigtable model, so it's more complicated than it needs to be. (300+kloc vs 50 for Cassandra; many more components). This means it's that much harder for me to troubleshoot. HBase is more bug-free than Cassandra but not so bug-free that troubleshooting would not be required.
  • Does not have any non-java clients. I need CPython support.
  • Sits on top of HDFS, which is optimized for streaming reads, not random accesses. So HBase is fine for batch processing but not so good for online apps.
  • See HBase, except it's written in C++.
Project Voldemort:
  • Voldemort is probably the key / value store in the pattern of Dynamo that is farthest along. If you only need key / value I would definitely recommend Voldemort. Next closest is probably Dynomite. Then there are a whole bunch of "me too" key value stores that made fatal architecture decisions or are writing a "memcached with persistence" without really thinking it through.
Running Cassandra [updated 04-22-09 for hacker news visitors]:
  1. prereqs: jdk6 and ant
  2. Check out the code from (did I mention this was early-adopter only?)
  3. run ant [optional: ant test]. If you get an error like "class file has wrong version 50.0, should be 49.0" then ant is using an old jdk version instead of 6.
  4. For non-java clients, install Thrift. (For java, trunk includes libthrift.jar.) This is a major undertaking in its own right. See this page for a list of dependencies, although most of the rest of that page is now outdated -- for instance, Cassandra no longer depends on the fb303 interface. Python users will have to hand-edit the generated in three obvious places until this bug is fixed -- just replace the broken argument with None.
  5. run bin/cassandra [optionally -f for foreground]
  6. Connect to the server. By default it listens on localhost:9160. Look at config/server-conf.xml for the columnfamily definitions.
  7. Insert and query some data. Here is an introduction.
  8. Ask on the mailing list or #cassandra on freenode if you have questions.
(Cassandra has a new website up to replace the google code one. We're actively working on the docs, so let us know what needs work.) Good luck!