Skip to main content

Cassandra 0.5.0 released

Apache Cassandra 0.5.0 was released over the weekend, four months after 0.4. (Upgrade notes; full changelog.) We're excited about releasing 0.5 because it makes life even better for people using Cassandra as their primary data source -- as opposed to a replica, possibly denormalized, of data that exists somewhere else.

The Cassandra distributed database has always had a commitlog to provide durable writes, and in 0.4 we added an option to waiting for commitlog sync before acknowledging writes, for cases where even a few seconds of potential data loss was not an option. But what if a node goes down temporarily? 0.5 adds proactive repair, what Dynamo calls "anti-entropy," to synchronize any updates Hinted Handoff or read repair didn't catch across all replicas for a given piece of data.

0.5 also adds load balancing and significantly improves bootstrap (adding nodes to a running cluster). We've also been busy adding documentation on operations in production and system internals.

Finally, in 0.5 we've improved concurrency across the board, improving insert speed by over 50% on the stress.py benchmark (from contrib/) on a relatively modest 4-core system with 2GB of ram. We've also added a [row] key cache, enabling similar relative improvements in reads:

(You will note that unlike most systems, Cassandra reads are usually slower than writes. 0.6 will narrow this gap with full row caching and mmap'd I/O, but fundamentally we think optimizing for writes is the right thing to do since writes have always been harder to scale.)

Log replay, flush, compaction, and range queries are also faster.

0.5 also brings new tools, including JSON-based data export and import, an improved command-line interface, and new JMX metrics.

One final note: like all distributed systems, Cassandra is designed to maximize throughput when under load from many clients. Benchmarking with a single thread or a small handful will not give you numbers representative of production (unless you only ever have four or five users at a time in production, I suppose). Please don't ask "why is Cassandra so slow" and offer up a single-threaded benchmark as evidence; that makes me sad inside. Here's 1000 words:

(Thanks to Brandon Williams for the graphs.)

Comments

Flavio said…
Hey Jonathan.
I would be really interested to know what is the cluster size of the cluster you have been using for the benchmarks. It would give a better understanding of the throughput/sec.

And how many threads have been used in the first graph and was the experiment really only executed on one single node?
Jonathan Ellis said…
The first graph is a single node, benchmarked with 50 threads.

The second graph is a 4 node cluster running with a replication factor of 3 (that is, each write is replicated to 3 nodes), so you'd expect throughput to be about 30% more than the single node case, minus some overhead for dealing with the network, and that's what happens.
Flavio said…
Okay great! Thanks. So you are using replication factor of 3 and read/write quorum of 2?
Jonathan Ellis said…
Both graphs are with ConsistencyLevel.ONE but that's not going to affect throughput much (just latency) since ConsistencyLevel doesn't reduce the number of writes you do (in a non-failure scenario), just how many you wait for before returning success to the client.
Anonymous said…
I would be interested to see your thread writing code. I'd like to emulate multiple threads writing to cassandra.
Jonathan Ellis said…
it's in contrib/py_stress (it actually uses multiple processes if multiprocessing is available, since the GIL sucks, but it will fall back to threads if it has to).
Unknown said…
Hey Jonathan, what was your physical disk setup (mounts, drives) for this bench?
Jonathan Ellis said…
One commitlog disk, one data disk, both xfs.
Michael Spiegel said…
I apologize if this is trivial, I want to be sure I'm interpreting the output from stress.py correctly. Default arguments to the script are OK (as long as I'm running python 2.6 to get multiprocessing). Run the script first with "-o insert" and then "-o read". The total throughput is the bottom entry in the leftmost column divided by the bottom entry in the bottom entry in the rightmost column. Is that correct? I happen to be using the 0.6.0-beta2 release.
Jonathan Ellis said…
Yes, although the time in the lower right is not a high-resolution clock :)
Michael Spiegel said…
Right. I should be using "time python stress.py". Duh, of course. Sorry for being so dense.
Yousef Ourabi said…
The links to changelog / notes are 404s
Jonathan Ellis said…
Yousef: thanks for the heads up, I've updated the post to the correct ones. (they changed when we graduated from the incubator.)

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren