Skip to main content

Why you won't be building your killer app on a distributed hash table

I ran across A case study in building layered DHT applications while doing some research on implementing load-balancing in Cassandra. The question addressed is, "Are DHTs a general-purpose tool that you can build more sophisticated services on?"

Short version: no. A few specialized applications can and have been built on a plain DHT, but most applications built on DHTs have ended up having to customize the DHT's internals to achieve their functional or performance goals.

This paper describes the results of attempting to build a relatively complex datastructure (prefix hash trees, for range queries) on top of OpenDHT. The result was mostly failure:

A simple put-get interface was not quite enough. In particular, OpenDHT relies on timeouts to invalidate entries and has no support for atomicity primitives... In return for ease of implementation and deployment, we sacrificed performance. With the OpenDHT implementation, a PHT query operation took a median of 2–4 seconds. This is due to the fact that layering entirely on top of a DHT service inherently implies that applications must perform a sequence of put-get operations to implement higher level semantics with limited opportunity for optimization within the DHT.
In other words, there are two primary problems with the DHT approach:
  • Most DHTs will require a second locking layer to achieve correctness when implementing a more complex data structure on top of the DHT semantics. In particular, this will certainly apply to eventually-consistent systems in the Dynamo mold.
  • Advanced functionality like range queries needs to be supported natively to be at all efficient.
While they spin this in a positive manner -- "hey, at least it didn't take much code" -- the reality is that for most of us, query latency of two to four seconds is several orders of magnitude away from acceptable.

This is one reason why I think Cassandra is the most promising of the open-source distributed databases -- you get a relatively rich data model and a distribution model that supports efficient range queries. These are not things that can be grafted on top of a simpler DHT foundation, so Cassandra will be useful for a wider variety of applications.

Comments

Christian said…
It sounds like some of the performance issues could have been due to their DHT of choice. I don't think that the paper alone is enough to discount DHTs as datastores for applications.

Also, I'm not sure what you mean about requiring additional locking. When working with eventually consistent DHTs, isn't this where the application is supposed to step in and resolve conflicts in the state of the resource caused by concurrent operations?
Jonathan Ellis said…
Even granting that a better DHT would be an order of magnitude faster than OpenDHT, that's still two orders of magnitude away from where you want to be. :) So I'm skeptical of this argument.

The additional locking refers to keeping the prefix hash structure consistent since updating that requires multiple commands sent to the underlying DHT. Not locking here will result in irresolvable corruption, so it's not a place to employ the "eventually consistent" approach to which you refer. See the actual paper for details.
dwight_10gen said…
I believe we are going to see at least three types of data stores for apps going forward:
(1) DHTs / key-value stores,
(2) nonrelational db's that scale but have deeper functionality such as range queries and secondary indexes, and
(3) traditional RDBMSes when one needs the most functionality from the DB (e.g. very complex transactions).

Really depends on the problem at hand -- the right tool for the right job.

(I work on MongoDB which is a database that fits into bucket (2) above.)
Tony Bain said…
I agree with Dwight. As much as we love the relational database it is not without many faults of its own and some of these are beginning to become serious issues as our demands of the RDB increase. I am in the middle of writing a series on this now here http://weny.ws/1Xx.

Currently we have gone too far in our mentality of a RDB for everything. The RDB is a great generic tool but it needs to be one of a set of options. I think we are starting to see the emergence and uptake of DBMS’s that are somewhere in the middle of the RDBMS and DHT which have a strong focus on scale, predictability, development model, manageability and transaction processing performance. This is much like the breakaway path we have seen emerging for high end analytics over the few years.
Jonas Oreland said…
Have you looked at MySQL Cluster (ndb)?
Eric said…
That article is a bit old...
Found a really promising project that also has been shown in a google talk. http://www.zib.de/CSR/Projects/scalaris
It supports range queries and heaps of other stuff.
Jonathan Ellis said…
First, the point is not that it's impossible to support range queries in a DHT or DHT-like system; the point is that the key/value paradigm alone doesn't give you enough power to build more advanced functionality like this -- it needs to be built-in to the system.

Second, Scalaris doesn't actually support range queries yet according to the most recent information I know of (http://groups.google.com/group/scalaris/msg/1d365e6605f1f030).
Pete Warden said…
I'm still working through the paper, but as someone who's wrestling with key/value primitives to perform complex queries, I'm skeptical that there's no way to perform the queries they want, as long as you're willing to burn up lots of storage space for indices and other denormalized information. Maybe it would have an unacceptable impact on insertions, but in the past I've cobbled together fast geographic range queries using application logic on top of kd-trees.

I'm with you on the idea that opening up the minimal interface a bit might be a big win though, I am particularly interested in MongoDB.
Tony Bain said…
" Pete Warden said... I'm skeptical that there's no way to perform the queries they want, as long as you're willing to burn up lots of storage space for indices..."

I agree but I think then it stops being a simple DHT and becomes a distributed key/value database which is kind of a DHT with knobs on.

TB
Eric said…
"By removing consistent hashing from Chord, we derived a protocol that has the same favorable logarithmic routing performance but needs less network hops for updating its routing table. Additionally, our Chord# protocol supports range queries, which are not possible with Chord. Our empirical results indicate that Chord# outperforms Chord even under high churn, that is, when nodes frequently join and leave the system." - Scalaris uses Chord#.
jsk said…
If you have range based storage structure, you can design own physical locality and get performance and reliability benefits. By convention you can use one row as your coordinating row of that sequence. With hashing it's a bit too random ;-)
Unknown said…
I don't know how you make the leap from "some problems can't be efficiently solved with this particular DHT implementation" to "distributed hash tables are a fail for all general-purpose applications."

Google seems to be doing quite well for itself on Bigtable, for instance.
vicaya said…
Bigtable and its open source counter parts: Hypertable and HBase are not DHT based as well. They use range split to distribute ranges (tablets).

OrderPreservingPartitioner and range query are relatively new features (only 2 months old) in Cassandra, I'd expect plenty of bugs and race conditions.
This text is priceless. How can I find out more?

Popular posts from this blog

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas...

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain.  Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding , so I decided to give WSL2 another try. Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward. The Good NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch. JetBrains products work a lot better now in remote development mod...

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren...