Skip to main content

Cassandra in action

There's been a lot of new articles about Cassandra deployments in the past month, enough that I thought it would be useful to summarize in a post.

Ryan King explained in an interview with Alex Popescu why Twitter is moving to Cassandra for tweet storage, and why they selected Cassandra over the alternatives. My experience is that the more someone understands large systems and the problems you can run into with them from an operational standpoint, the more likely they are to choose Cassandra when doing this kind of evaluation. Ryan's list of criteria is worth checking out.

Digg followed up their earlier announcement that they had taken part of their site live on Cassandra with another saying that they've now "reimplemented most of Digg's functionality using Cassandra as our primary datastore." Digg engineer Ian Eure also gave some more details on Digg's cassandra data model in a Hacker News thread.

Om Malik quoted extensively from the Digg announcement and from Rackspace engineer Stu Hood, who explained Cassandra's appeal: "Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure. When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single row to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values."

The Twitter and Digg news kicked off a lot of publicity, including a lot of "me too" articles but some interesting ones, including a highscalability post wondering if this was the end of the mysql + memcached era. If not quite yet the end, then the beginning of it. As Ian Eure from Digg said, "If you're deploying memcache on top of your database, you're inventing your own ad-hoc, difficult to maintain NoSQL system." Possibly the best commentary on this idea is Dare Obasanjo's, who explained "Digg's usage of Cassandra actually serves as a rebuttal to [an article claiming SQL scales just fine] since they couldn't feasibly get what they want with either horizontal or vertical scaling of their relational database-based solution."

Reddit also migrated to Cassandra from memcachedb, in only 10 days, the fastest migration to Cassandra I've seen. More comments from the engineer doing the migration, ketralnis, in the reddit discussion thread.

CloudKick blogged about how they use Cassandra for time series data, including a sketch of their data model. CloudKick migrated from PostgreSQL, skewering the theory you will sometimes see proffered that "only MySQL users are migrating to NoSQL, not people who use [my favorite vendor's relational database]."

Jake Luciani wrote about how Lucandra, the Cassandra Lucene back-end works, and how he's using it to power the Twitter search app sparse.ly. IMO, Lucandra is one of Cassandra's killer apps.

The FightMyMonster team switched from HBase to Cassandra after concluding that "HBase is more suitable for data warehousing, and large scale data processing and analysis... and Cassandra is more suitable for real time transaction processing and the serving of interactive data." Dominic covers CAP, architecture considerations, benchmarks, map/reduce, and durability in explaining his conclusion.

Eric Peters gave a talk on Cassandra use at his company, Frugal Mechanic, at the Seattle Tech Startups Meetup. This was interesting not because Frugal Mechanic is a big name but because it's not. I haven't seen Eric's name on the Cassandra mailing lists at all, but there he was deploying it and giving a talk on it, showing that Cassandra is starting to move beyond early adopters. (And, just maybe, that our documentation is improving. :)

Finally, Eric Florenzano has a live demo up now of Cassandra running a Twitter clone at twissandra.com, with source at github, as an example of how to use Cassandra's data model. If you're interested in the nuts and bolts of how to build an app on Cassandra, you should check it out.

Comments

Alex Popescu said…
Pretty soon there will be some more exciting Cassandra coverage on myNoSQL. Thanks for the round-up.
Note that Jake Luciani will be giving a talk on Lucandra next month at the Search & Discovery Meetup in NY:

http://www.meetup.com/NYC-Search-and-Discovery/

The Lucandra meetup is about to be scheduled there.
Jonathan Ellis said…
Glad to hear that Alex, I look forward to reading it.

Thanks for the link, Otis!
Birdman said…
Great summary of the recent news, these are all good resources.
Anonymous said…
There's = There is. Effectively, "There is been". D'oh!
Jonathan Ellis said…
"There's" is also the contracted form of "there has." Kids these days... :P
Jawaad Mahmood said…
For what it is worth, we are doing a talk about Cassandra at the Tokyo hackerspace next week. It definitely isn't "just Another nosql" anymore.
Jonathan Ellis said…
@Jawaad: Great! Hope you can post your slides afterwards!
mabstyle said…
[an article claiming SQL scales just fine]

I like how you censured the link. For those who want to read some good original articles on the subject, instead of mis-characterizations of same:

one

two

three
Chmouel said…
By the way the reedit move was from a key value model (memcache) to another key/value model (cassandra)... not the full thing like digg...

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of Lambda School from the father of a recent graduate

Background I’ve been a professional developer for twenty years.  I exposed my son N to programming a couple times while he was growing up --  Scratch when he was around 8, Khan Academy javascript when he was 12.  He learned it easily enough but it didn’t grab him. But his junior year in high school he had a hole in his schedule and I convinced him to try AP CS to fill it.  And this time, he got hooked.  He started programming for fun in the evenings.  You know how it goes. Then in March 2020, Covid hit and his high school went virtual.  It was a terrible experience, to the point that instead of going back for more his senior year, he took the last classes he needed to graduate over the summer, and decided to apply to programming boot camps in the fall.  I think the American college system is broken , so I was happy to help evaluate his options for something different. Evaluating boot camps N and I came up with three criteria for evaluating boot camps.  If they didn’t meet these three,