Skip to main content

Reed-Solomon libraries

If you want to run a multi-petabyte storage system then you don't want to do it with Raid 5 or Raid 6; with modern disks' ~3% per year failure rate, that's 300 a year when you have 10000 disks and the odds start to get pretty good (relatively speaking) that you'll face permanent data loss at some point when you lose a third disk from an array while two are rebuilding. And of course monitoring and replacing disks in lots of small arrays is manpower-intensive, which to investors translates as "expensive."

You probably don't want to go with triplication, either; disks are cheap, but not so cheap that you want to triple your hardware costs unnecessarily. While storing multiple copies of frequently used data is good, all your data probably isn't "frequently used."

What is the solution? As it turns out, Raid is actually a special case of Reed-Solomon encoding, which lets you specify any degree of redundancy you want. You can be safer than triplication with a fraction of the space needed.

I was prompted to write this because Mozy open-sourced the Reed-Solomon library I used while I was there, librs, complete with Python bindings. The original librs we used at Mozy was written by Byron Clark, a formidible task. Later we switched to the version you see on sourceforge, based on Plank's original encoder. I wasn't involved with librs at all except to fix a couple reference leaks in the Python wrapper.

But if you're actually looking for an rs library to use, Alen Peacock, who is much more knowledgeable than I about the gory details involved here, tells me that if you are starting from scratch the two libraries you should evaluate are zfec, which also comes with Python bindings, and Jerasure which is an updated -- i.e., probably faster than his first -- encoder by Plank. (Jerasure has nothing to do with Java.)

Comments

Shane Hathaway said…
It's great to see these new libraries pop up. Reed Solomon accomplishes an amazing trick that more software developers should be aware of. I don't think the possibilities have been explored nearly enough.

The unintuitive truth is that a tunable error correction algorithm easily achieves much higher reliability than replication, with less hardware. For example, if I organize my data into blocks spanning 20 disks, then put 10 error correction blocks on 10 other disks, that data is statistically safer than it would be on a system that maintains 4 replicas. Even better, I only need 50% extra space instead of 300%! (In fact, 20+10 is overkill; I'd rather use around 20+7.)

I've created a petabyte-scale storage system based on RS called Bit Mountain, but my employer is not mature in the ways of open source, so I can't release it. Fortunately, several other groups are now seeing the light, including allmydata.org. I need to check out their Tahoe project.

Another way I'd like to see RS applied is in packet-level transmission, especially for VOIP. Bandwidth isn't a problem anymore and delays up to 500 ms are unimportant. What's bad is regular packet loss. RS could solve the packet loss by generating a stream of forward error correcting packets that accompany the normal packets. I wish I could just turn on some iptables filter to add RS coding to a connection.
Unknown said…
Note that jerasure now lives at http://jerasure.org

Popular posts from this blog

The Missing Piece in AI Coding: Automated Context Discovery

I recently switched tasks from writing the ColBERT Live! library and related benchmarking tools to authoring BM25 search for Cassandra . I was able to implement the former almost entirely with "coding in English" via Aider . That is: I gave the LLM tasks, in English, and it generated diffs for me that Aider applied to my source files. This made me easily 5x more productive vs writing code by hand, even with AI autocomplete like Copilot. It felt amazing! (Take a minute to check out this short thread on a real-life session with Aider , if you've never tried it.) Coming back to Cassandra, by contrast, felt like swimming through molasses. Doing everything by hand is tedious when you know that an LLM could do it faster if you could just structure the problem correctly for it. It felt like writing assembly without a compiler -- a useful skill in narrow situations, but mostly not a good use of human intelligence today. The key difference in these two sce...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain.  Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding , so I decided to give WSL2 another try. Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward. The Good NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch. JetBrains products work a lot better now in remote development mod...