Skip to main content

Cassandra 0.5.0 released

Apache Cassandra 0.5.0 was released over the weekend, four months after 0.4. (Upgrade notes; full changelog.) We're excited about releasing 0.5 because it makes life even better for people using Cassandra as their primary data source -- as opposed to a replica, possibly denormalized, of data that exists somewhere else.

The Cassandra distributed database has always had a commitlog to provide durable writes, and in 0.4 we added an option to waiting for commitlog sync before acknowledging writes, for cases where even a few seconds of potential data loss was not an option. But what if a node goes down temporarily? 0.5 adds proactive repair, what Dynamo calls "anti-entropy," to synchronize any updates Hinted Handoff or read repair didn't catch across all replicas for a given piece of data.

0.5 also adds load balancing and significantly improves bootstrap (adding nodes to a running cluster). We've also been busy adding documentation on operations in production and system internals.

Finally, in 0.5 we've improved concurrency across the board, improving insert speed by over 50% on the stress.py benchmark (from contrib/) on a relatively modest 4-core system with 2GB of ram. We've also added a [row] key cache, enabling similar relative improvements in reads:

(You will note that unlike most systems, Cassandra reads are usually slower than writes. 0.6 will narrow this gap with full row caching and mmap'd I/O, but fundamentally we think optimizing for writes is the right thing to do since writes have always been harder to scale.)

Log replay, flush, compaction, and range queries are also faster.

0.5 also brings new tools, including JSON-based data export and import, an improved command-line interface, and new JMX metrics.

One final note: like all distributed systems, Cassandra is designed to maximize throughput when under load from many clients. Benchmarking with a single thread or a small handful will not give you numbers representative of production (unless you only ever have four or five users at a time in production, I suppose). Please don't ask "why is Cassandra so slow" and offer up a single-threaded benchmark as evidence; that makes me sad inside. Here's 1000 words:

(Thanks to Brandon Williams for the graphs.)

Comments

Flavio said…
Hey Jonathan.
I would be really interested to know what is the cluster size of the cluster you have been using for the benchmarks. It would give a better understanding of the throughput/sec.

And how many threads have been used in the first graph and was the experiment really only executed on one single node?
Jonathan Ellis said…
The first graph is a single node, benchmarked with 50 threads.

The second graph is a 4 node cluster running with a replication factor of 3 (that is, each write is replicated to 3 nodes), so you'd expect throughput to be about 30% more than the single node case, minus some overhead for dealing with the network, and that's what happens.
Flavio said…
Okay great! Thanks. So you are using replication factor of 3 and read/write quorum of 2?
Jonathan Ellis said…
Both graphs are with ConsistencyLevel.ONE but that's not going to affect throughput much (just latency) since ConsistencyLevel doesn't reduce the number of writes you do (in a non-failure scenario), just how many you wait for before returning success to the client.
Anonymous said…
I would be interested to see your thread writing code. I'd like to emulate multiple threads writing to cassandra.
Jonathan Ellis said…
it's in contrib/py_stress (it actually uses multiple processes if multiprocessing is available, since the GIL sucks, but it will fall back to threads if it has to).
Unknown said…
Hey Jonathan, what was your physical disk setup (mounts, drives) for this bench?
Jonathan Ellis said…
One commitlog disk, one data disk, both xfs.
Michael Spiegel said…
I apologize if this is trivial, I want to be sure I'm interpreting the output from stress.py correctly. Default arguments to the script are OK (as long as I'm running python 2.6 to get multiprocessing). Run the script first with "-o insert" and then "-o read". The total throughput is the bottom entry in the leftmost column divided by the bottom entry in the bottom entry in the rightmost column. Is that correct? I happen to be using the 0.6.0-beta2 release.
Jonathan Ellis said…
Yes, although the time in the lower right is not a high-resolution clock :)
Michael Spiegel said…
Right. I should be using "time python stress.py". Duh, of course. Sorry for being so dense.
Yousef Ourabi said…
The links to changelog / notes are 404s
Jonathan Ellis said…
Yousef: thanks for the heads up, I've updated the post to the correct ones. (they changed when we graduated from the incubator.)

Popular posts from this blog

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain.  Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding , so I decided to give WSL2 another try. Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward. The Good NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch. JetBrains products work a lot better now in remote development mod...

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...