Skip to main content

Database Replication

I spent some time yesterday researching (free) database replication options. Judging from the newsgroup posts I saw, there's a lot of confusion out there. The most common use case appears to be failover, i.e., you want to minimize downtime in the face of software or hardware failure by replicating your data across multiple machines. But, the most commonly-used options are completely inappropriate for this purpose.

As Josh Berkus explained, there are two "dimensions" to replication: synchronous vs async, and master/slave vs multimaster.

For a failover solution, if you want database B to take over from database A in case of failure, with no data loss, only synchronous solutions make sense. By definition, asynchronous replication means that database A can commit a transaction before those changes are also committed to database B. If A happens to fail between commit and replication, you've lost data. If that's not acceptable for you, then neither is async replication.

Be aware that the most popular replication solutions for both PostgreSQL and MySQL are asynchronous.

  • In part because of the contributions by the likes of Fujitsu and Affilas (.org and .info registrar), Slony-I is the most high-profile replication solution for PostgreSQL. Slony-I provides only asynchronous replication.
  • MySQL replication is also asynchronous.

So what are the options for synchronous replication?

  • MySQL "clustering" appears to allow for synchronous replication, but requires use of the separate NDB storage engine, which has a long list of limitations vs MyISAM or InnoDB. (No foreign key support, no triggers, basically none of the features MySQL has been adding for the past few years. Oh, and you need enough RAM to hold your entire database twice over.)
  • PgCluster for postgresql seems fairly mature, but the 1.3 (8.0-based) and 1.5 (8.1-based) versions still aren't out of beta. PgCluster also patches the postgresql source directly, which makes me a little nervous.
  • Another option is something like pgpool, which multiplexes updates across multiple databases. The biggest limitation of this approach is that you're on your own for recovery, i.e., after A goes down and you switch to B alone, how do you get A back in sync? A fairly common approach is to combine pgpool with Slony-I async replication for recovery.

The bottom line is, high availability isn't as simple as adding whatever "replication" solution you first run across. You need to understand what the different kinds of replication are, and which are appropriate to your specific situation.

Comments

Anonymous said…
If you have not already, check out Linux Virtual Server (LVS). I've used LVS with MySQL before for high availability and load balancing. By using keepalived and some scripts any slave could take over as master if the master heartbeat died.
Jonathan Ellis said…
So, basically the same approach as pgpool, but lower-level? Interesting.
Anonymous said…
How is pgpool+Slony accomplished, exactly? Are you referring to placing pgpool in master/slave mode as described at the bottom of the pgpool Web site?
Jonathan Ellis said…
Yes, in which case (unlike what I implied) you're relying entirely on slony for the replication so you're back to async. Sorry for the mistake.
Anonymous said…
which one do you prefer between asynchronous and synchronous with pgcluster and slony?? and why??
Jonathan Ellis said…
We use Slony; it has a much bigger user base and is the "safe" choice. And if you're doing replication that probably matters to you. :)
Anonymous said…
i want to try to use slony to replicate my dbase..could you help me give me simple way thats i can follow?? i've try to use like in documentation but i still can't do it.i am novice user who want to know performance slony.could you give me the site or url which told me the difference between slony and pgcluster.or may be,how can i measure performance slony?? i mean how i can measure time of replication in slony??
Thanks
Regards,
Bayu
by_pacitan@yahoo.com

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren