Skip to main content

Materialized views in PostgreSQL

One thing I've wanted to write about for a while is materialized views in PostgreSQL. Materialized views are basically precomputed views; they're very very useful if you have an expensive query against data that doesn't change much. This is one of the tricks I use to keep the db overhead on Carnage Blender reasonable.

Oracle and DB2 are the only RDBMS products that can auto-materialize views for you, but you can rig them yourself in PostgreSQL or anything that gives you a powerful enough trigger + procedural language. (And if performance is crucial, you'll end up hand-optimizing them in Oracle/DB2 as well. At least, back when I used Oracle in the 8.1 days.)

But while looking for examples of a certain PL/PGSQL feature this morning, I found Jonathan Gardner's materialized views page (link may be down; see the archive.org copy) which covers the subject so well that now I don't have anything left to add. Nicely done!

Incidently, this is just one more reason real procedural languages in the database are critical for most substantial projects. Implementing materialized views in your application logic would be worse than a joke.

Comments

Ian Bicking said…
Would it be that hard to do in application logic? Depends somewhat on the database, I guess... but if you can intercept (via application logic) all the updates to that table (or otherwise efficiently detect that they have occurred) then you could manually create the "view" from the application side.
Jonathan Ellis said…
First, it's on a par with the people who say "I maintain referential integrity in my application." Sooner or later you or your successor forgets and screws something up. It really isn't if, but when. I've seen this happen at three different companies, unfortunately.

Second, it's significantly less efficient to slurp data out of the db, perform computations, and stick it back in, than to do all that inside the database. Often you can wave efficiency under the carpet but presumably you wouldn't be doing this in the first place if you weren't optimizing a bottleneck.

Third, it's going to be very tricky to maintain transactional integrity while you perform the updates on the application side. At best you can get by with row locking but often you'll have to use the Big Hammer table lock. With PostgreSQL or Oracle-style MVCC everyone sees a consistent view of the data without locking.

Fourth, if you have multiple codebases accessing the data, see #1 and good luck.

Finally, maybe you are God's gift to programming and you really can do everything in app logic. And maybe your boss doesn't care about what happens when you leave and he has to hire a mere mortal to maintain it. But if you're happy using lousy tools, why are you using Python instead of C? :)
Unknown said…
Materialized views sound a useful workaround, if your database doesn't have a query cache. If you do have a query cache, then you already effectively have eager or lazy materialized views (depending on your isolation level): Just use your normal view (or query) and let the database figure it out.

Snapshots are trivial to do in the application, and by definition are refreshed according to application logic (i.e. "only updated when refreshed", according to original article).
Jonathan Ellis said…
MySQL's query cache works well, but it is a simple solution for simple problems.

For Carnage Blender, for instance, the biggest materialized view is stats derived from the main parties table and others. Dozens of queries hit this, joining to different other relations. Using the query cache approach would take far more memory (and memory is an issue for me: I already have a 5 GB database on a 4 GB motherboard) than letting PG cache the relations, with the expensive part precomputed in the mview, and derive what the queries need as necessary.
Anonymous said…
I am merely an interested observer rather than an expert on this subject, but according to a guru of pgsql-general@postgresql.org (Tom Lane) there is no query cache in PG. I would be very interested if somebody could clarify this.
SWK
Anonymous said…
One of the things missing in Postgres from Oracle is the ability to have the select rewritten on the fly to point to a MV table instead of the original table (half of the power of the MV).

I hope Postgres SELECT rule restrictions will be loosened in the future to allow for this type of redirection based upon noticing a SELECT is the same as the one that created the MV.
Anonymous said…
The original post by Jonathan Gardner is gone. I have re-posted it at on my site Materialized Views in PostgreSQL
Jonathan Ellis said…
Thanks for the pointer, although I don't think labeling someone else's work as Copyright 2008 yourself is kosher, even if his site is gone.
Unknown said…
SQL Server supports materialized views, too. They're called "indexed views."

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren