Skip to main content

Cassandra 0.3 release candidate and progress

We have a release candidate out for Cassandra 0.3. Grab the download and check out how to get started. The facebook presentation from almost a year ago now is also still a good intro to some of the features and data model.

Cassandra in a nutshell:

  • Scales writes very, very well: just add more nodes!
  • Has a much richer data model than vanilla key/value stores -- closer to what you'd be used to in a relational db.
  • Is pretty bleeding edge -- to my knowledge, Facebook is the only group running Cassandra in production. (Their largest cluster is 120 machines and 40TB of data.) At Rackspace we are working on a Cassandra-based app now that 0.3 has the extra features we need.
  • Moved to the Apache Incubator about 40 days ago, at which point development greatly accelerated.
Changes in 0.3 include
  • Range queries on keys, including user-defined key collation.
  • Remove support, which is nontrivial in an eventually consistent world.
  • Workaround for a weird bug in JDK select/register that seems particularly common on VM environments. Cassandra should deploy fine on EC2 now. (Oddly, it never had problems on Slicehost / Cloud Servers, which is also Xen-based.)
  • Much improved infrastructure: the beginnings of a decent test suite ("ant test" for unit tests; "nosetests" for system tests), code coverage reporting, etc.
  • Expanded node status reporting via JMX
  • Improved error reporting/logging on both server and client
  • Reduced memory footprint in default configuration
  • and plenty of bug fixes.
For those of you just joining us, Cassandra already had
  • An advanced on-disk storage engine that never does random writes
  • Transaction log-based data integrity
  • P2P gossip failure detection
  • Read repair
  • Hinted handoff
  • Bootstrap (adding new nodes to a running cluster)
(Read repair and hinted handoff are discussed in more detail in the Dynamo paper.)

The cassandra development and user community is also growing at an exciting pace. Besides the original two developers from Facebook, we now have five developers regularly contributing improvements and fixes, and many others on a more ad-hoc basis.

How fast is it?

In a nutshell, Cassandra is much faster than relational databases, and much slower than memory-only systems or systems that don't sync each update to disk. Actual benchmarks are in the works. We plan to start performance tuning with the next release, but if you want to benchmark it, here are some suggestions to get numbers closer to what you'll see in the wild (and about 10x more throughput than if you don't do these):

  • Do enough runs of your benchmark first that each operation tested by your suite runs 20k times before timing it for real. This will allow the JVM jit to compile down to machine code; otherwise you'll just be getting the interpreted version.
  • Change the root logger level in conf/log4j.properties from DEBUG to INFO; we do a LOT of logging for debuggability and for small column values the logging has more overhead than the actual workload. (It would be even faster if we were to remove them entirely but that didn't make this release.)

Comments

Alex Popescu said…
Hi Jonathan,

I have just checked the Cassandra page and I still couldn't find a released version. While checking the SVN repository, I've noticed 3 tags (3 RCs for 0.3) and also a 0.3 branch.
Should I understand that there isn't yet a 0.3 final? Are there any plans to pack a distribution?

tia,
./alex
Jonathan Ellis said…
the saga of the 0.3 release is in the cassandra-dev archives.

the short version is, releasing with the ASF is like pulling teeth.

one at a time.

with no anaesthetic.
Alex Popescu said…
Jonathan, I've been active on ASF for quite a while (not anymore lately) and while I cannot say it was easy, I don't think I've heard anyone get killed by the process. Just check how may releases Struts2 or Jackrabbit had.
So, is there anything else? Or is it just the lack of somebody stepping up to run the process?
Jonathan Ellis said…
No major blockers, no. (Which is part of why it's frustrating. :)
Jonathan Ellis said…
Added some links to what's been happening over here: http://spyced.blogspot.com/2009/07/cassandra-03-update.html

Popular posts from this blog

PyCon Python IDE review

I presented an IDE review at PyCon last Friday. It was basically a re-review of what I thought were the 3 most promising IDEs from the Utah Python User Group IDE review , to which I added SPE, which was by far the most popular of the ones we left out that time. The versions reviewed are: PyDev 1.0.2 SPE 0.8.2.a Komodo 3.5.2 Wing IDE 2.1 beta 1 I'd intended to base my presentation around a comparison of writing a smallish program in each of the IDEs, but the more I tried to make this not suck, the more I realized it was a losing proposition. Instead, I decided to try to focus on the features in each that most set them apart from the others (both positive and negative); this seemed more likely be useful. (I did a new feature matrix for this review, which is included after my comments. The slides I used are also up, at http://utahpython.org/jellis/pycon-ides.pdf , but aren't very useful absent video of the presentation itself. Hence this post.) PyDev PyDev has g...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren...