Friday, December 01, 2006

Benchmark: PostgreSQL beats the stuffing out of MySQL

This is interesting, because the conventional wisdom of idiots on slashdot continues to be "use postgresql if you need advanced features, but use mysql if all you care about is speed," despite all the head-to-head benchmarks I've seen by third parties showing PostgreSQL to be faster under load. (MySQL's own benchmarks, of course, tend to favor their own product. Go figure, huh.)

Here's the latest, showing postgresql about 50% faster than mysql across a variety of hardware. But where MySQL really takes a pounding is when you add multiple cores / CPUs: MySQL adds 37% performance going from 1 to 4 cores; postgresql adds 226%. Ouch! (This would also explain why MySQL sucks so hard on the Niagra chip on the requests-per-second graph -- Sun sacrificed GHz to get more cores in.)

As even low-end servers start to go multicore this is going to be increasingly important.

Update: PostgreSQL core member Josh Berkus says :

[This] is a validation of the last four years of PostgreSQL performance engineering. It's not done yet ... if the Tweakers.net test had included 16+ core machines you'd have seen PostgreSQL topping out ... but our hackers have done quite well.

31 comments:

Anonymous said...

Apparently there's always MySQL Cluster if you want to use lots of hardware, but ignoring that, PostgreSQL 8.x is frequently faster than MySQL 5.x except for some kinds of queries, in my experience. And PostgreSQL seems to be a lot faster at index updates, too.

Anonymous said...

That's great. I would love to use postgres except that it lacks some features I need.

Clustering/multi master replication, failover, and case insensitive where clauses.

David said...

Clustering and failover - check out Slony-I: http://slony.info/

Multi-master replication: Slony-2 is in development, or there is PGCluster: http://pgfoundry.org/projects/pgcluster

Case insensitive where clause: Use "ILIKE" instead of "LIKE" (although, indexing lower(column) and comparing against lower('where_clause') can use an index, whereas ILIKE can not.

Honestly, whatever MySQL can do, PostgreSQL can probably do better - there are very few exceptions.

Anonymous said...

What about row level locking. I know that postgres is using tablocks... :-(

Desmond said...

PostgreSQL has a mechanism called MVCC (MultiVersion Concurrency Control), comparable or superior to best commercial databases. It can do row-level locking, can lock rows for writing in one session but give these rows unaffected in another session. MVCC is considered better than row-level locking because a reader is never blocked by writer. Instead, Postgres keeps track of all transactions and is able to manage the records without waiting to become available.

me said...

I was using mysql.. then switched recently to postgresql...
The MySQL databased folder was 8times the size of postgresql folder..

The site is perceptibly faster. Love MVCC, pg_autovacuum, no locking required...

http://www.teenwag.com

Teenwag

thanks for an awesome decisive post...

knut said...


PostgreSQL has a mechanism called MVCC (MultiVersion Concurrency Control), comparable or superior to best commercial databases. It can do row-level locking, can lock rows for writing in one session but give these rows unaffected in another session.

This has an disadvantage also, it leads to defragmented data because an update to a row is actually done on a copy elsevere in the file/filesystem. This leads again to slower sequential reads of tables due to the architecture of an harddrive.

JOE said...

"Use "ILIKE" instead of "LIKE" (although, indexing lower(column) and comparing against lower('where_clause') can use an index, whereas ILIKE can not."

That's fine but it means I can't use any of the ORM layers written in any language. It means I have to litter my code with SQL statements that are postgres specific (ILIKE or *~).

This is a simple collation problem and people have been asking for it at least five or six years (just query the mailing list) and the postgres team refuses to come up with a case insensitive collation or to include any of the case insensitive data types people have developed. The only excuse they have is that it's not "standard" as if ILIKE or *~ are standards.

Anyway postgres has always been late to the party. Late on replication, late on clustering, late on full text searching, late on a windows port, late on ODBC drivers, late on ADO drivers, and still no case insensitive collations or data types.

This is why people use mysql. They deliver what people want. Yes you can argue that MYSQL implementations of some that stuff is crap, yes you can argue that postgres has other better features but you can't deny that mysql delivers what most people want before postgres does.

If you are wondering why people use mysql that's the reason. For the last decade I have tried many times to use postgres in one my projects and every time I run into one obstacle or another. Just last week I abandoned my attempt to trying it one more time (with rails) because I can't demand that my users remember the precise capitalization of every piece of text and I don't want to rewrite activerecord.

James Bennett said...

Joe said:

"That's fine but it means I can't use any of the ORM layers written in any language. It means I have to litter my code with SQL statements that are postgres specific (ILIKE or *~). "

Perhaps you need to find some better ORMs. The one I work with (built in to Django) supports your choice of "contains" (which translates to LIKE in Postgres) or "icontains" (which translates to ILIKE) in its querying syntax.

JOE said...

"Perhaps you need to find some better ORMs. The one I work with (built in to Django) supports your choice of "contains" (which translates to LIKE in Postgres) or "icontains" (which translates to ILIKE) in its querying syntax."

That's wonderful but what if you are not using django? I am sure django is wonderful and all but some people may prefer ruby, or php, or java, or .net or something.

Also what does using ILIKE do to your speed? Does it use indexes? Would an application where every instance of X='abc' is run as x ILIKE 'abc' run at acceptable speeds?

Anyway you missed my point. The point is that for at least the last five years this topic has come up again and again on the postgres mailing lists. People wanting to migrate from mysql, ms-sql, sybase, informix etc have asked for it. The solution is simple. Make a case insensitive collation for postgres. Failing that make a case insensitive text type (somebody has already done it!!!) and include it in the default distribution.

Why is the postgresql team so obstinate about this much requested feature? Does anybody buy the "it's not a standard" line especially when the next paragraph says "use ILIKE"?

I know I am ranting a bit but it's only because I WANT to use postgres but am unable to do so because it's lacking a feature I really need. Every time I try to use it I run into some limitation or another. I remember in the days of yore waiting for outer joins and then giving up when I deadlines caused me to use another database instead.

Anonymous said...

For data replication/failover/etc for Postgresql, check out pgpool. I've been using it for awhile, and works really well!

Anonymous said...

This is bug http://bugs.mysql.com/bug.php?id=15815, fixed in 5.0.30.

Anonymous said...

JOE: "This is a simple collation problem and people have been asking for it at least five or six years (just query the mailing list) and the postgres team refuses to come up with a case insensitive collation or to include any of the case insensitive data types people have developed."

I'm not surprised, but then that's actually the right thing to do. I was fairly surprised to see that MySQL had a case insensitive collation enabled by default, causing a slight inflation of results for my application, and I imagine that for many applications it's totally inappropriate.

JOE: "The only excuse they have is that it's not "standard" as if ILIKE or *~ are standards."

Sounds like a good excuse to me.

And on the subject of full-text searching, I read in various places (including the MySQL manual's comments) that MySQL is relatively hopeless for any real full-text search solution. If ticking boxes on the feature sheet is important then I guess MySQL is just fine; otherwise, I suppose people may find the various extensions to PostgreSQL convenient enough to compile.

JOE: "Just last week I abandoned my attempt to trying it one more time (with rails) because I can't demand that my users remember the precise capitalization of every piece of text and I don't want to rewrite activerecord."

This sounds like a combination of user interface shortcomings and inadequate tools to me.

Anonymous said...

Joe: Hmm really weird that you can't handle searching in your rails app. I have just built an file indexing/ searching/version control system with rails and postgres... Works great! (Even though tsearch2 have some issues with partial word matching...)

Anonymous said...

Joe: "Make a case insensitive collation for postgres. Failing that make a case insensitive text type (somebody has already done it!!!) and include it in the default distribution."

PostgreSQL is extensible and using some of the rich set of extensions available is expected. Try telling PERL that a capability isn't good enough because it's in CPAN rather than "the default distribution".

Download and install the extension that does what you need, and install. This has all kinds of benefits over the MySQL "everything goes in core" mentality. For one, you can get an updated module without having to wait for the next release of PostgreSQL. Slony-I is a great example of an add-on. If it had to wait each time for a full PostgreSQL release, the project would not be where it is today.

geoave said...

Working in oracle most of the time, the practice I'm most used to is using upper() and lower() statements.

These have the advantage of being nearly universal.

Anonymous said...

I find it remarkable that people here have defended the lack or a case insensitive collation feature here. It's a feature that every other database has open source or otherwise. It's a feature that has been asked for for years by people who want to use postgres. It's a feature that's easy to add. Despite all that it does not exist in postgres.

Instead we get semi insulting suggestions at workarounds like use *~ or ILIKE neither of which are an SQL standard or use the full text searching.

Here is another suggestion for people who want case insensitive where clauses. Use mysql, mssql, oracle, firebird, db/2, openbase, frontbase, or any one hundreds of databases that have that feature.

Geoff Gowan said...

I noticed the description of the benchmark mentioned this:

"[PostgreSQL] has had a couple of its indexes replaced to allow better performance. Additionally, a few data structures were altered."

Based on this vague statement and the overall text it seems to me that this benchmark was meant to compare processors, not databases. The point is to find out which processor is fastest for each database, not which database is fastest. The results may still be relevant, but this interpretation sounds like an over-generalization at this point.

Also it's worth noting that the benchmark looks mainly at read speed when all data is in the buffer cache. For many applications this is far from the only interesting benchmark for comparing databases. However, it's a good benchmark for comparing processors to each other.

Anonymous said...

Apparently there's always MySQL Cluster if you want to use lots of hardware

Hehe, check this post: http://forums.mysql.com/read.php?25,93181,93181

Dan said...

Im not here to pick sides, but reading about the HOW the benchmark was done leaves many questions.

Read/write
As far as i can tell, its mostly simple select statements? So how is the write operations in comparison?

Bechmarking sql? or php?
Also, it seems the benchmark was made by requesting a php4.4 page, through apache 2.2, using "apache-bencmark", which begs the question: Are the SQL client implementations in PHP equally good?

The statements
And where are the actual SQL statements? It is importent to see these to understand the benchmark.
Eg. Did the test contain LIKE statements? And did the test use ILIKE on postgres, so the actual search was the same?
Ie. were the sql-statements "designed" for postgres?

Conclusions
There are so many extra- and unkown factors in this benchmark that one cant conclude anything from it.

Like Jonathan Ellis stated in the original text. MySQL benchmarks tend to favor their own product. So does mssql, oracle and so forth.
Id love to see and absolutely independed benchmark of all these. Also of postgres, so youre sure that the guys benchmarking are out to prove postgres faster, instead of an unbiased bench.

Ppl tend to get very, ehrrm... Religious about their favorite product. If the benchmarker is just half as biased as this original blog-post, the entire test-result is useless.

Jay Pipes said...

Would be nice to compare apples to apples. The tweakers article, while thorough, compares the most current PostgreSQL *development* version to the >3 year old MySQL 4.1 series and the >1 year old 5.0 series.

Would be nice to see the 2 dev versions against each other...

As a MySQL employee, I think it's fair to say that PostgreSQL has done a fantastic job in focusing on performance-tuning for the 8.2 release, and kudos to Josh and the team for their efforts!

Both RDBMS systems have something to offer for various crowds in need of dissimilar features, add-ons, ease of use, etc. No one at MySQL knocks PostgreSQL. We thrive on the healthy competition that comes out of the open source community.

Anyway, on a general note regarding benchmarks, it is almost always possible to create benchmark results that prove a specific point in favor of one side of an argument. But, in the case of the tweakers benchmarks, I think that the results indeed show a performance degradation for multiple cores/processors that MySQL should look at closely, and I'm sure our engineers will do so.

On a related note, I should note that the MySQL 5.0.30 release included a patch for InnoDB performance which showed a substantial increase in performance on multi-core/processor machines (>20% AFAIK).

Anyway, cheers,

Jay Pipes
Community Relations Manager, North America, MySQL, Inc.

Anonymous said...

I found the tweekers benchmark very useful, and it is reasonably neutral since they are a mysql site and seems slightly shocked at how fast postgres was after all the historic press about mysql being faster.

Internal Benchmarks
We wrote some internal benchmark software in dotNet, and found postgres 8.0 about 50% faster than mysql (InnoDB)both running on windows with a dotNet client. The test was a stored proceedure that similated creating invoices for multiple companies. There was a table with each companies current invoice no which was incremented as new invoices were inserted into the general invoice table. We then ran X companies with y threads per company.

SQL server screamed at around 1,500 transactions per second, Postgres peeked at around 300, and my sql at 180. The biggest problem with Postgres under windows was that windows allocated too much memory to each postmaster process. It is much much more efficent with a unix platform.

The main difference betwen Postgres and both mysql and sql server is that it does not colapse under load. When we really stressed things mysql just gave up, and we have crashed the server. SQL server performance dropped from 5 times faster to around 20% faster.

I recently build a collection of virtual machines Open BSD, Free BSD, Net BSD, Ubuntu, deb etch, gentoo and have been meaning to try postgres under these.

Postgres was around 70 transactions per second on open bsd, using 100% cpu on the virtual machine (50% cpu of the actual machine)

MVCC works
It is really noticable that postgres keeps going under heavy load. Where as the row level locking databases slow down a lot from their peaks.

I have not done a bench mark yet with postgres on a unix box vs sql server with different clients, but it is noticable that a postgres system on unix only uses about 60 meg for around 100 connections, which is reasonable vs the gig on windows.

My SQL Cluster
We had intended to go with MySQL Cluster since it is by far the best cluster architecture and the backend is rock solid and incredibly efficent.

Unfortunately the front end servers let it down. We tuned it up from around 5 transactions per second (10meg interconnects)to 180 per second (100 meg switched) on our internal benchmark.

We did manage to get about 2,000 inserts per second into the cluster using 3 front end machines, and the backend was still under minimal load.

There have been a lot of features added in mysql 5 and it is maturing very quickly.

Our current strategy is to stay with SQL server, try and move to postgres and keep watching the mysql clusters.


Case insensitive collations
Comming from SQL server case sensitive collations are a real pain. However we have written our own search engine which converts everything to lower case, which means for that part it will be fine, and our primary keys are all int so again no issue.

The bits that bug me are that unless you use lower case in all your table/column names you have to quote them. I prefer WordWord to word_word names. The crazy thing is that the engine converts WordWord to wordword unless it is quoted.

utf8 support
The biggest advantage for us in moving is the utf8 support in both mysql and postgres.

Would prefer case insensitive indexed where clauses, but it will be a very big job porting, and we can probably live with it, since we are looking to translate our table generation, and stored proceedures into the new database, and use one set of code for both databases.

Anonymous said...

ScimoreDB does scale-out. However, they are missing features: no views, not real procedures, no replication...
For distributed query processing, they use functional shippment rather than data, that benefits performance a lot to compare with MySQL cluster.

Anonymous said...

I am currently comparing a mySQL5 install with a PostgreSQL 8.2 install on the same machine. The app I'm planning has to take huge insert loads. And guess waht? mySQL performs MUCH faster (3 to 4 times) when inserting BIG bunchs of data. And no. I am not using delayed inserts.

What a pity. Postgresql looks so much more mature to me.

Anonymous said...

Humm.. So we really can state this:

Performance depends on your benchmark ;-)

TravisO said...

It seems people have overlooked one issue...

Generally speaking, speed and quality (aka data integrity) are inversely related in databases. That's why traditionally Oracle was the slowest and MySQL was the fastest. This is why your database license often forbids you from publishing benchmarks, because fast speed makes people think it makes for a better database.

But as time has progressed and each DB has matured, after they've achieved quality, they've been optimizing things to improve speed without losing features or data integrity.

I've been biased towards msSQL because it offers great features (was especially important years back, but now everybody has Stored Procs and transactions) and amazing speed. But in today's world, and licensing, I could see myself preferring PostgreSQL and MySQL (if I really had to) as they have all the essential features and started to go beyond needing key features and are entering the state of maturing those features.

Charles Duffy said...

Joe,

Anyone who says "Postgres has always been late to the party" got to the party pretty damned late themselves.

I remember when MySQL had no support for relational integrity, no support for views, no support for transactions... and while PostgreSQL had all those things, the MySQL team was promoting the idea that all those features were useless performance-impacting cruft that nobody needed unless they were running a bank.

Anonymous said...

Postgres is good, MySql is good, Oracle is good. Mostly the quality of the code you produce is dependent on you knowing the db you're coding against. I like Postgres & Oracle, because I have experience with them. both do a fine job of storing data.

Arguments pro/con about postgres work better if they are rational, factual and emotionless. I think your opinion here is valid, but the presentation is not flattering to you. Don't worry be happy. :-)

floyd_n_milan said...

Cool.

Anonymous said...

Fart

Sajin tm said...

I realized the power of PostgreSQL a little late. I am not an usual blogger but to show this difference, i wrote a post here.http://tmsajin.blogspot.com/2010/08/performance-comparison-test-between.html