Skip to main content


Showing posts from March, 2010

Cassandra in Google Summer of Code 2010

Cassandra is participating in the Google Summer of Code, which opened for proposal submission today . Cassandra is part of the Apache Software Foundation, which has its own page of guidelines up for students and mentors. We have a good mix of project ideas involving both core and non-core areas, from straightforward code bashing to some pretty tricky stuff, depending on your appetite. Core tickets aren't necessarily harder than non-core, but they will require reading and understanding more existing code. Non-core Create a web ui for cassandra : we have a (fairly minimal) command line interface, but a web gui is more user-friendly. There is the beginnings of such a beast in the Cassandra source tree at contrib/cassandra_browser [pretty ugly Python code] and a gtk-based one at [also Python, less ugly]. First-class commandline interface : if you prefer to kick things old-school, improving the cli itself would also be welcome. Create a Cassandr

Cassandra in action

There's been a lot of new articles about Cassandra deployments in the past month, enough that I thought it would be useful to summarize in a post. Ryan King explained in an interview with Alex Popescu why Twitter is moving to Cassandra for tweet storage, and why they selected Cassandra over the alternatives. My experience is that the more someone understands large systems and the problems you can run into with them from an operational standpoint, the more likely they are to choose Cassandra when doing this kind of evaluation. Ryan's list of criteria is worth checking out. Digg followed up their earlier announcement that they had taken part of their site live on Cassandra with another saying that they've now "reimplemented most of Digg's functionality using Cassandra as our primary datastore." Digg engineer Ian Eure also gave some more details on Digg's cassandra data model in a Hacker News thread. Om Malik quoted extensively from the Digg

Why your data may not belong in the cloud

Several of the reports of the recently-concluded NoSQL Live event mentioned that I took a contrarian position on the "NoSQL in the Cloud" panel, arguing that traditional, bare metal servers usually make more sense. Here's why. There are two reasons to use cloud infrastructure (and by cloud I mean here "commodity VMs such as those provided by Rackspace Cloud Servers or Amazon EC2): You only need a fraction of the capacity of a single machine Your demand is highly elastic; you want to be able to quickly spin up many new instances, then drop them when you are done Most people looking at NoSQL solutions are doing it because their data is larger than a traditional solution can handle, or will be, so (1) is not a very strong motivation. But what about (2)? At first glance, cloud is a great fit for adding capacity to a database cluster painlessly. But there's an important difference between load like web traffic that bounces up and down frequently,