Skip to main content

Posts

Showing posts from January, 2010

Cassandra 0.5.0 released

Apache Cassandra 0.5.0 was released over the weekend, four months after 0.4. ( Upgrade notes ; full changelog .) We're excited about releasing 0.5 because it makes life even better for people using Cassandra as their primary data source -- as opposed to a replica, possibly denormalized, of data that exists somewhere else. The Cassandra distributed database has always had a commitlog to provide durable writes, and in 0.4 we added an option to waiting for commitlog sync before acknowledging writes, for cases where even a few seconds of potential data loss was not an option. But what if a node goes down temporarily? 0.5 adds proactive repair, what Dynamo calls "anti-entropy," to synchronize any updates Hinted Handoff or read repair didn't catch across all replicas for a given piece of data. 0.5 also adds load balancing and significantly improves bootstrap (adding nodes to a running cluster). We've also been busy adding documentation on operations in product...

Linux performance basics

I want to write about Cassandra performance tuning, but first I need to cover some basics: how to use vmstat, iostat, and top to understand what part of your system is the bottleneck -- not just for Cassandra but for any system. vmstat You will typically run vmstat with "vmstat sampling-period", e.g., "vmstat 5." The output looks like this: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 20 0 195540 32772 6952 576752 0 0 11 12 38 43 1 0 99 0 22 2 195536 35988 6680 575132 6 0 2952 14 959 16375 72 21 4 3 The first line is your total system average since boot; typically this will not be very useful, since you are interested in what is causing problems NOW. Then you will get one line per sample period; most of the output is self explanatory. The reason to start with vmstat is the "swap" section: si and s...