Wednesday, May 27, 2009

Why you won't be building your killer app on a distributed hash table

I ran across A case study in building layered DHT applications while doing some research on implementing load-balancing in Cassandra. The question addressed is, "Are DHTs a general-purpose tool that you can build more sophisticated services on?"

Short version: no. A few specialized applications can and have been built on a plain DHT, but most applications built on DHTs have ended up having to customize the DHT's internals to achieve their functional or performance goals.

This paper describes the results of attempting to build a relatively complex datastructure (prefix hash trees, for range queries) on top of OpenDHT. The result was mostly failure:

A simple put-get interface was not quite enough. In particular, OpenDHT relies on timeouts to invalidate entries and has no support for atomicity primitives... In return for ease of implementation and deployment, we sacrificed performance. With the OpenDHT implementation, a PHT query operation took a median of 2–4 seconds. This is due to the fact that layering entirely on top of a DHT service inherently implies that applications must perform a sequence of put-get operations to implement higher level semantics with limited opportunity for optimization within the DHT.
In other words, there are two primary problems with the DHT approach:
  • Most DHTs will require a second locking layer to achieve correctness when implementing a more complex data structure on top of the DHT semantics. In particular, this will certainly apply to eventually-consistent systems in the Dynamo mold.
  • Advanced functionality like range queries needs to be supported natively to be at all efficient.
While they spin this in a positive manner -- "hey, at least it didn't take much code" -- the reality is that for most of us, query latency of two to four seconds is several orders of magnitude away from acceptable.

This is one reason why I think Cassandra is the most promising of the open-source distributed databases -- you get a relatively rich data model and a distribution model that supports efficient range queries. These are not things that can be grafted on top of a simpler DHT foundation, so Cassandra will be useful for a wider variety of applications.

13 comments:

Christian said...

It sounds like some of the performance issues could have been due to their DHT of choice. I don't think that the paper alone is enough to discount DHTs as datastores for applications.

Also, I'm not sure what you mean about requiring additional locking. When working with eventually consistent DHTs, isn't this where the application is supposed to step in and resolve conflicts in the state of the resource caused by concurrent operations?

Jonathan Ellis said...

Even granting that a better DHT would be an order of magnitude faster than OpenDHT, that's still two orders of magnitude away from where you want to be. :) So I'm skeptical of this argument.

The additional locking refers to keeping the prefix hash structure consistent since updating that requires multiple commands sent to the underlying DHT. Not locking here will result in irresolvable corruption, so it's not a place to employ the "eventually consistent" approach to which you refer. See the actual paper for details.

dwight_10gen said...

I believe we are going to see at least three types of data stores for apps going forward:
(1) DHTs / key-value stores,
(2) nonrelational db's that scale but have deeper functionality such as range queries and secondary indexes, and
(3) traditional RDBMSes when one needs the most functionality from the DB (e.g. very complex transactions).

Really depends on the problem at hand -- the right tool for the right job.

(I work on MongoDB which is a database that fits into bucket (2) above.)

Tony said...

I agree with Dwight. As much as we love the relational database it is not without many faults of its own and some of these are beginning to become serious issues as our demands of the RDB increase. I am in the middle of writing a series on this now here http://weny.ws/1Xx.

Currently we have gone too far in our mentality of a RDB for everything. The RDB is a great generic tool but it needs to be one of a set of options. I think we are starting to see the emergence and uptake of DBMS’s that are somewhere in the middle of the RDBMS and DHT which have a strong focus on scale, predictability, development model, manageability and transaction processing performance. This is much like the breakaway path we have seen emerging for high end analytics over the few years.

Jonas Oreland said...

Have you looked at MySQL Cluster (ndb)?

Eric said...

That article is a bit old...
Found a really promising project that also has been shown in a google talk. http://www.zib.de/CSR/Projects/scalaris
It supports range queries and heaps of other stuff.

Jonathan Ellis said...

First, the point is not that it's impossible to support range queries in a DHT or DHT-like system; the point is that the key/value paradigm alone doesn't give you enough power to build more advanced functionality like this -- it needs to be built-in to the system.

Second, Scalaris doesn't actually support range queries yet according to the most recent information I know of (http://groups.google.com/group/scalaris/msg/1d365e6605f1f030).

Pete Warden said...

I'm still working through the paper, but as someone who's wrestling with key/value primitives to perform complex queries, I'm skeptical that there's no way to perform the queries they want, as long as you're willing to burn up lots of storage space for indices and other denormalized information. Maybe it would have an unacceptable impact on insertions, but in the past I've cobbled together fast geographic range queries using application logic on top of kd-trees.

I'm with you on the idea that opening up the minimal interface a bit might be a big win though, I am particularly interested in MongoDB.

Tony said...

" Pete Warden said... I'm skeptical that there's no way to perform the queries they want, as long as you're willing to burn up lots of storage space for indices..."

I agree but I think then it stops being a simple DHT and becomes a distributed key/value database which is kind of a DHT with knobs on.

TB

Eric said...

"By removing consistent hashing from Chord, we derived a protocol that has the same favorable logarithmic routing performance but needs less network hops for updating its routing table. Additionally, our Chord# protocol supports range queries, which are not possible with Chord. Our empirical results indicate that Chord# outperforms Chord even under high churn, that is, when nodes frequently join and leave the system." - Scalaris uses Chord#.

jsk said...

If you have range based storage structure, you can design own physical locality and get performance and reliability benefits. By convention you can use one row as your coordinating row of that sequence. With hashing it's a bit too random ;-)

how.gauche said...

I don't know how you make the leap from "some problems can't be efficiently solved with this particular DHT implementation" to "distributed hash tables are a fail for all general-purpose applications."

Google seems to be doing quite well for itself on Bigtable, for instance.

vicaya said...

Bigtable and its open source counter parts: Hypertable and HBase are not DHT based as well. They use range split to distribute ranges (tablets).

OrderPreservingPartitioner and range query are relatively new features (only 2 months old) in Cassandra, I'd expect plenty of bugs and race conditions.