Spyced

All you ever wanted to know about writing bloom filters

It seems like Bloom filters are all the rage these days. Three years ago I had barely heard of them and now it seems like I see articles and code using them all the time. That's mostly a good thing, since bloom filters are a very useful tool to avoid performing expensive computations without the full memory overhead of a standard map/dictionary. Bloom filters are surprisingly simple: divide a memory area into buckets (one bit per bucket for a standard bloom filter; more -- typically four -- for a counting bloom filter). To insert a key, generate several hashes per key, and mark the buckets for each hash. To check if a key is present, check each bucket; if any bucket is empty, the key was never inserted in the filter. If all buckets are non-empty, though, the key is only probably inserted -- other keys' hashes could have covered the same buckets. Determining exactly how big to make the filter and how many hashes to use to achieve a given false positive rate is a solved p...

Spyced

Search This Blog

Posts

All you ever wanted to know about writing bloom filters