Wednesday, May 11, 2005

Materialized views in PostgreSQL

One thing I've wanted to write about for a while is materialized views in PostgreSQL. Materialized views are basically precomputed views; they're very very useful if you have an expensive query against data that doesn't change much. This is one of the tricks I use to keep the db overhead on Carnage Blender reasonable.

Oracle and DB2 are the only RDBMS products that can auto-materialize views for you, but you can rig them yourself in PostgreSQL or anything that gives you a powerful enough trigger + procedural language. (And if performance is crucial, you'll end up hand-optimizing them in Oracle/DB2 as well. At least, back when I used Oracle in the 8.1 days.)

But while looking for examples of a certain PL/PGSQL feature this morning, I found Jonathan Gardner's materialized views page (link may be down; see the archive.org copy) which covers the subject so well that now I don't have anything left to add. Nicely done!

Incidently, this is just one more reason real procedural languages in the database are critical for most substantial projects. Implementing materialized views in your application logic would be worse than a joke.

11 comments:

Ian Bicking said...

Would it be that hard to do in application logic? Depends somewhat on the database, I guess... but if you can intercept (via application logic) all the updates to that table (or otherwise efficiently detect that they have occurred) then you could manually create the "view" from the application side.

Jonathan Ellis said...

First, it's on a par with the people who say "I maintain referential integrity in my application." Sooner or later you or your successor forgets and screws something up. It really isn't if, but when. I've seen this happen at three different companies, unfortunately.

Second, it's significantly less efficient to slurp data out of the db, perform computations, and stick it back in, than to do all that inside the database. Often you can wave efficiency under the carpet but presumably you wouldn't be doing this in the first place if you weren't optimizing a bottleneck.

Third, it's going to be very tricky to maintain transactional integrity while you perform the updates on the application side. At best you can get by with row locking but often you'll have to use the Big Hammer table lock. With PostgreSQL or Oracle-style MVCC everyone sees a consistent view of the data without locking.

Fourth, if you have multiple codebases accessing the data, see #1 and good luck.

Finally, maybe you are God's gift to programming and you really can do everything in app logic. And maybe your boss doesn't care about what happens when you leave and he has to hire a mere mortal to maintain it. But if you're happy using lousy tools, why are you using Python instead of C? :)

Farce Pest said...

Materialized views sound a useful workaround, if your database doesn't have a query cache. If you do have a query cache, then you already effectively have eager or lazy materialized views (depending on your isolation level): Just use your normal view (or query) and let the database figure it out.

Snapshots are trivial to do in the application, and by definition are refreshed according to application logic (i.e. "only updated when refreshed", according to original article).

Jonathan Ellis said...

MySQL's query cache works well, but it is a simple solution for simple problems.

For Carnage Blender, for instance, the biggest materialized view is stats derived from the main parties table and others. Dozens of queries hit this, joining to different other relations. Using the query cache approach would take far more memory (and memory is an issue for me: I already have a 5 GB database on a 4 GB motherboard) than letting PG cache the relations, with the expensive part precomputed in the mview, and derive what the queries need as necessary.

SunWuKung said...

I am merely an interested observer rather than an expert on this subject, but according to a guru of pgsql-general@postgresql.org (Tom Lane) there is no query cache in PG. I would be very interested if somebody could clarify this.
SWK

Bob Ippolito said...

FWIW, MySQL's query cache is only useful if your table doesn't change very often.

Let's say your table is an access log that gets updated several times a second, and you want a materialized view that presents daily statistics.

MySQL's query cache won't do anything for you in this situation, because it gets invalidated a few times a second.

This is why PostgreSQL doesn't have a query cache; it's not a general solution to any non-trivial problem.

Anonymous said...

One of the things missing in Postgres from Oracle is the ability to have the select rewritten on the fly to point to a MV table instead of the original table (half of the power of the MV).

I hope Postgres SELECT rule restrictions will be loosened in the future to allow for this type of redirection based upon noticing a SELECT is the same as the one that created the MV.

Benjamin said...

The original post by Jonathan Gardner is gone. I have re-posted it at on my site Materialized Views in PostgreSQL

Jonathan Ellis said...

Thanks for the pointer, although I don't think labeling someone else's work as Copyright 2008 yourself is kosher, even if his site is gone.

Anonymous said...

The original link has moved:

http://jonathangardner.net/tech/w/PostgreSQL/Materialized_Views

David said...

SQL Server supports materialized views, too. They're called "indexed views."