Skip to main content

Materialized views in PostgreSQL

One thing I've wanted to write about for a while is materialized views in PostgreSQL. Materialized views are basically precomputed views; they're very very useful if you have an expensive query against data that doesn't change much. This is one of the tricks I use to keep the db overhead on Carnage Blender reasonable.

Oracle and DB2 are the only RDBMS products that can auto-materialize views for you, but you can rig them yourself in PostgreSQL or anything that gives you a powerful enough trigger + procedural language. (And if performance is crucial, you'll end up hand-optimizing them in Oracle/DB2 as well. At least, back when I used Oracle in the 8.1 days.)

But while looking for examples of a certain PL/PGSQL feature this morning, I found Jonathan Gardner's materialized views page (link may be down; see the archive.org copy) which covers the subject so well that now I don't have anything left to add. Nicely done!

Incidently, this is just one more reason real procedural languages in the database are critical for most substantial projects. Implementing materialized views in your application logic would be worse than a joke.

Comments

Ian Bicking said…
Would it be that hard to do in application logic? Depends somewhat on the database, I guess... but if you can intercept (via application logic) all the updates to that table (or otherwise efficiently detect that they have occurred) then you could manually create the "view" from the application side.
Jonathan Ellis said…
First, it's on a par with the people who say "I maintain referential integrity in my application." Sooner or later you or your successor forgets and screws something up. It really isn't if, but when. I've seen this happen at three different companies, unfortunately.

Second, it's significantly less efficient to slurp data out of the db, perform computations, and stick it back in, than to do all that inside the database. Often you can wave efficiency under the carpet but presumably you wouldn't be doing this in the first place if you weren't optimizing a bottleneck.

Third, it's going to be very tricky to maintain transactional integrity while you perform the updates on the application side. At best you can get by with row locking but often you'll have to use the Big Hammer table lock. With PostgreSQL or Oracle-style MVCC everyone sees a consistent view of the data without locking.

Fourth, if you have multiple codebases accessing the data, see #1 and good luck.

Finally, maybe you are God's gift to programming and you really can do everything in app logic. And maybe your boss doesn't care about what happens when you leave and he has to hire a mere mortal to maintain it. But if you're happy using lousy tools, why are you using Python instead of C? :)
Unknown said…
Materialized views sound a useful workaround, if your database doesn't have a query cache. If you do have a query cache, then you already effectively have eager or lazy materialized views (depending on your isolation level): Just use your normal view (or query) and let the database figure it out.

Snapshots are trivial to do in the application, and by definition are refreshed according to application logic (i.e. "only updated when refreshed", according to original article).
Jonathan Ellis said…
MySQL's query cache works well, but it is a simple solution for simple problems.

For Carnage Blender, for instance, the biggest materialized view is stats derived from the main parties table and others. Dozens of queries hit this, joining to different other relations. Using the query cache approach would take far more memory (and memory is an issue for me: I already have a 5 GB database on a 4 GB motherboard) than letting PG cache the relations, with the expensive part precomputed in the mview, and derive what the queries need as necessary.
Anonymous said…
I am merely an interested observer rather than an expert on this subject, but according to a guru of pgsql-general@postgresql.org (Tom Lane) there is no query cache in PG. I would be very interested if somebody could clarify this.
SWK
Anonymous said…
One of the things missing in Postgres from Oracle is the ability to have the select rewritten on the fly to point to a MV table instead of the original table (half of the power of the MV).

I hope Postgres SELECT rule restrictions will be loosened in the future to allow for this type of redirection based upon noticing a SELECT is the same as the one that created the MV.
Anonymous said…
The original post by Jonathan Gardner is gone. I have re-posted it at on my site Materialized Views in PostgreSQL
Jonathan Ellis said…
Thanks for the pointer, although I don't think labeling someone else's work as Copyright 2008 yourself is kosher, even if his site is gone.
Unknown said…
SQL Server supports materialized views, too. They're called "indexed views."
Jake K said…
Good reaading

Popular posts from this blog

A week of Windows Subsystem for Linux

I first experimented with WSL2 as a daily development environment two years ago. Things were still pretty rough around the edges, especially with JetBrains' IDEs, and I ended up buying a dedicated Linux workstation so I wouldn't have to deal with the pain.  Unfortunately, the Linux box developed a heat management problem, and simultaneously I found myself needing a beefier GPU than it had for working on multi-vector encoding , so I decided to give WSL2 another try. Here's some of the highlights and lowlights. TLDR, it's working well enough that I'm probably going to continue using it as my primary development machine going forward. The Good NVIDIA CUDA drivers just work. I was blown away that I ran conda install cuda -c nvidia and it worked the first try. No farting around with Linux kernel header versions or arcane errors from nvidia-smi. It just worked, including with PyTorch. JetBrains products work a lot better now in remote development mod...

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...