- Writes are serialized. Not serialized as in the isolation level, serialized as in there can only be one write active at a time. Want to spread writes across multiple disks? Sorry.
- CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes. Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less.
- CouchDB is simple. Gloriously simple. Why is that a negative? It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years. The reason PostgreSQL et al have those features is because people want them. And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing. The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces.
- A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce. MapReduce is a great approach to trivially parallelizing certain classes of problem. The problem is, it's tedious and error-prone to write raw MapReduce code. This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively). Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages. It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.
Wednesday, December 31, 2008
Wednesday, December 24, 2008
Today marks one month that I've been working for Rackspace's RackLabs with the Mosso group in San Antonio, Texas. (Anyone want to start a Python group? The closest one is in Austin.)
It's kind of a gentle introduction to big company culture for me; at around 2,000 employees, Rackspace is easily ten times as large as any other company I've worked for, and 100 times as large as most. Mosso is a lot smaller and RackLabs itself is smaller still, but I still had to go to five days (!) of corporate orientation. Other than that, though, we're pretty much left alone by our corporate parent.
To start with, I'm working on Mosso's Cloud Files, which is basically an S3 competitor. Cloud Files is similar to the work I did at Mozy, but there are a lot of technical differences. Some are driven by Cloud Files being more of a general purpose storage engine than the one I wrote for Mozy; others stem from the Cloud Files authors being Twisted fans.
Strange coincidence: as with Mozy, I share an office here with a Debian developer, probably the only one in San Antonio. My experience is that debian developers are pretty sharp guys, probably in no small part due to the rigorous screening process you have to go through. They set a high bar.
Of course this continues to be my personal blog, and all opinions are mine alone. RackLabs has its own blog for when they want to say something official.
Friday, December 19, 2008
I'm a little over a week into a git immersion program. Let me just say that git's reputation of being a little arcane (okay, more than a little) and having a steep learning curve is 100% deserved.
One thing that would mitigate things is if git would give you feedback when you tell it to do nonsense. But it doesn't. Here's me trying to get machine B to always merge the debug branch from machine A when I pull:
232 git config branch.debug.remote origin 234 git config branch.master.remote origin 236 git config branch.master.remote origin/debug
All of these commands completed silently. None accomplished what I wanted. In the end I renamed master to old and debug to master to avoid having to fight it. Then I blew away my working copy and re-cloned because those config statements had created a new problem that I didn't know how to undo.
I'm sure the git virtuosos out there will know what was wrong. That's not the point. The point is that the tool gave me no feedback. It was like git was telling me, "Figure it out yourself. Or don't. I don't care." Which is par for the course with my git experience so far.