Skip to main content

Cassandra 0.3 release candidate and progress

We have a release candidate out for Cassandra 0.3. Grab the download and check out how to get started. The facebook presentation from almost a year ago now is also still a good intro to some of the features and data model.

Cassandra in a nutshell:

  • Scales writes very, very well: just add more nodes!
  • Has a much richer data model than vanilla key/value stores -- closer to what you'd be used to in a relational db.
  • Is pretty bleeding edge -- to my knowledge, Facebook is the only group running Cassandra in production. (Their largest cluster is 120 machines and 40TB of data.) At Rackspace we are working on a Cassandra-based app now that 0.3 has the extra features we need.
  • Moved to the Apache Incubator about 40 days ago, at which point development greatly accelerated.
Changes in 0.3 include
  • Range queries on keys, including user-defined key collation.
  • Remove support, which is nontrivial in an eventually consistent world.
  • Workaround for a weird bug in JDK select/register that seems particularly common on VM environments. Cassandra should deploy fine on EC2 now. (Oddly, it never had problems on Slicehost / Cloud Servers, which is also Xen-based.)
  • Much improved infrastructure: the beginnings of a decent test suite ("ant test" for unit tests; "nosetests" for system tests), code coverage reporting, etc.
  • Expanded node status reporting via JMX
  • Improved error reporting/logging on both server and client
  • Reduced memory footprint in default configuration
  • and plenty of bug fixes.
For those of you just joining us, Cassandra already had
  • An advanced on-disk storage engine that never does random writes
  • Transaction log-based data integrity
  • P2P gossip failure detection
  • Read repair
  • Hinted handoff
  • Bootstrap (adding new nodes to a running cluster)
(Read repair and hinted handoff are discussed in more detail in the Dynamo paper.)

The cassandra development and user community is also growing at an exciting pace. Besides the original two developers from Facebook, we now have five developers regularly contributing improvements and fixes, and many others on a more ad-hoc basis.

How fast is it?

In a nutshell, Cassandra is much faster than relational databases, and much slower than memory-only systems or systems that don't sync each update to disk. Actual benchmarks are in the works. We plan to start performance tuning with the next release, but if you want to benchmark it, here are some suggestions to get numbers closer to what you'll see in the wild (and about 10x more throughput than if you don't do these):

  • Do enough runs of your benchmark first that each operation tested by your suite runs 20k times before timing it for real. This will allow the JVM jit to compile down to machine code; otherwise you'll just be getting the interpreted version.
  • Change the root logger level in conf/log4j.properties from DEBUG to INFO; we do a LOT of logging for debuggability and for small column values the logging has more overhead than the actual workload. (It would be even faster if we were to remove them entirely but that didn't make this release.)

Comments

Alex Popescu said…
Hi Jonathan,

I have just checked the Cassandra page and I still couldn't find a released version. While checking the SVN repository, I've noticed 3 tags (3 RCs for 0.3) and also a 0.3 branch.
Should I understand that there isn't yet a 0.3 final? Are there any plans to pack a distribution?

tia,
./alex
Jonathan Ellis said…
the saga of the 0.3 release is in the cassandra-dev archives.

the short version is, releasing with the ASF is like pulling teeth.

one at a time.

with no anaesthetic.
Alex Popescu said…
Jonathan, I've been active on ASF for quite a while (not anymore lately) and while I cannot say it was easy, I don't think I've heard anyone get killed by the process. Just check how may releases Struts2 or Jackrabbit had.
So, is there anything else? Or is it just the lack of somebody stepping up to run the process?
Jonathan Ellis said…
No major blockers, no. (Which is part of why it's frustrating. :)
Jonathan Ellis said…
Added some links to what's been happening over here: http://spyced.blogspot.com/2009/07/cassandra-03-update.html

Popular posts from this blog

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain.  Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding , so I decided to give WSL2 another try. Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward. The Good NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch. JetBrains products work a lot better now in remote development mod...

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...