Skip to main content


Showing posts from February, 2010

Distributed deletes in the Cassandra database

Handling deletes in a distributed, eventually consistent system is a little tricky, as demonstrated by the fairly frequent recurrence of the question, " Why doesn't disk usage immediately decrease when I remove data in Cassandra ?" As background, recall that a Cassandra cluster defines a ReplicationFactor that determines how many nodes each key and associated columns are written to. In Cassandra (as in Dynamo ), the client controls how many replicas to block for on writes, which includes deletions. In particular, the client may (and typically will) specify a ConsistencyLevel of less than the cluster's ReplicationFactor, that is, the coordinating server node should report the write successful even if some replicas are down or otherwise not responsive to the write. (Thus, the "eventual" in eventual consistency: if a client reads from a replica that did not get the update with a low enough ConsistencyLevel, it will potentially see old data. Cassandra u