Skip to main content

Database Replication

I spent some time yesterday researching (free) database replication options. Judging from the newsgroup posts I saw, there's a lot of confusion out there. The most common use case appears to be failover, i.e., you want to minimize downtime in the face of software or hardware failure by replicating your data across multiple machines. But, the most commonly-used options are completely inappropriate for this purpose.

As Josh Berkus explained, there are two "dimensions" to replication: synchronous vs async, and master/slave vs multimaster.

For a failover solution, if you want database B to take over from database A in case of failure, with no data loss, only synchronous solutions make sense. By definition, asynchronous replication means that database A can commit a transaction before those changes are also committed to database B. If A happens to fail between commit and replication, you've lost data. If that's not acceptable for you, then neither is async replication.

Be aware that the most popular replication solutions for both PostgreSQL and MySQL are asynchronous.

  • In part because of the contributions by the likes of Fujitsu and Affilas (.org and .info registrar), Slony-I is the most high-profile replication solution for PostgreSQL. Slony-I provides only asynchronous replication.
  • MySQL replication is also asynchronous.

So what are the options for synchronous replication?

  • MySQL "clustering" appears to allow for synchronous replication, but requires use of the separate NDB storage engine, which has a long list of limitations vs MyISAM or InnoDB. (No foreign key support, no triggers, basically none of the features MySQL has been adding for the past few years. Oh, and you need enough RAM to hold your entire database twice over.)
  • PgCluster for postgresql seems fairly mature, but the 1.3 (8.0-based) and 1.5 (8.1-based) versions still aren't out of beta. PgCluster also patches the postgresql source directly, which makes me a little nervous.
  • Another option is something like pgpool, which multiplexes updates across multiple databases. The biggest limitation of this approach is that you're on your own for recovery, i.e., after A goes down and you switch to B alone, how do you get A back in sync? A fairly common approach is to combine pgpool with Slony-I async replication for recovery.

The bottom line is, high availability isn't as simple as adding whatever "replication" solution you first run across. You need to understand what the different kinds of replication are, and which are appropriate to your specific situation.

Comments

Anonymous said…
If you have not already, check out Linux Virtual Server (LVS). I've used LVS with MySQL before for high availability and load balancing. By using keepalived and some scripts any slave could take over as master if the master heartbeat died.
Jonathan Ellis said…
So, basically the same approach as pgpool, but lower-level? Interesting.
Anonymous said…
How is pgpool+Slony accomplished, exactly? Are you referring to placing pgpool in master/slave mode as described at the bottom of the pgpool Web site?
Jonathan Ellis said…
Yes, in which case (unlike what I implied) you're relying entirely on slony for the replication so you're back to async. Sorry for the mistake.
Anonymous said…
which one do you prefer between asynchronous and synchronous with pgcluster and slony?? and why??
Jonathan Ellis said…
We use Slony; it has a much bigger user base and is the "safe" choice. And if you're doing replication that probably matters to you. :)
Anonymous said…
i want to try to use slony to replicate my dbase..could you help me give me simple way thats i can follow?? i've try to use like in documentation but i still can't do it.i am novice user who want to know performance slony.could you give me the site or url which told me the difference between slony and pgcluster.or may be,how can i measure performance slony?? i mean how i can measure time of replication in slony??
Thanks
Regards,
Bayu
by_pacitan@yahoo.com

Popular posts from this blog

The Missing Piece in AI Coding: Automated Context Discovery

I recently switched tasks from writing the ColBERT Live! library and related benchmarking tools to authoring BM25 search for Cassandra . I was able to implement the former almost entirely with "coding in English" via Aider . That is: I gave the LLM tasks, in English, and it generated diffs for me that Aider applied to my source files. This made me easily 5x more productive vs writing code by hand, even with AI autocomplete like Copilot. It felt amazing! (Take a minute to check out this short thread on a real-life session with Aider , if you've never tried it.) Coming back to Cassandra, by contrast, felt like swimming through molasses. Doing everything by hand is tedious when you know that an LLM could do it faster if you could just structure the problem correctly for it. It felt like writing assembly without a compiler -- a useful skill in narrow situations, but mostly not a good use of human intelligence today. The key difference in these two sce...

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain.  Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding , so I decided to give WSL2 another try. Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward. The Good NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch. JetBrains products work a lot better now in remote development mod...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...