Skip to main content

plpython intro

My current employer keeps a lot of application metadata as xml in the database, and I needed to make a simple update to some of that. Writing a client program to do this would be immense overkill, as well as not playing nicely with my database auto-upgrade script. The update could be easily handled with a regular expression substitution, but although postgresql has a decent regular expression implementation built-in, it has no function to do replacement with them. [Update: regex_replace was added in version 8.1, six months after this post.)

An underused feature of PostgreSQL is its ability to define functions in just about any language you like. Java, Perl, Python, TCL, among others, as well as PL/PGSQL. Most developers, unfortunately, are a bit leery of defining custom functions in the database. Whether this is because of the learning curve (which turns out to be quite shallow), or because they are too used to inferior databases that don't allow such things, I couldn't say.

Anyway, creating a function that did what I needed was trivial.

-- pattern, original, replacement, flags
CREATE OR REPLACE FUNCTION regsub (VARCHAR, VARCHAR, VARCHAR, VARCHAR) 
RETURNS VARCHAR AS '
    import re
    flags = 0
    for flagchar in args[3]:
        flags |= getattr(re, flagchar)
    p = re.compile(args[0], flags)
    return re.sub(p, args[2], args[1])
' LANGUAGE 'plpythonu'
WITH (ISCACHABLE);

-- pattern, original, replacement
CREATE OR REPLACE FUNCTION regsub (VARCHAR, VARCHAR, VARCHAR) RETURNS VARCHAR AS '
    select regsub($1, $2, $3, '''');
' LANGUAGE 'sql'
WITH (ISCACHABLE);
Some things to note:
  • Postgresql 8.0 supports giving function variables names in the function definition statement, but the plpython handler doesn't know about this. (Anyone want to fix this? I just don't have the time.) So I'm stuck with the old-style args list.
  • Callers can't access the re namespace to grab the contants out as ints, so flags are passed as a concatenated string ('IS' = IGNORECASE + DOTALL, etc.) and turned into an appropriate int with getattr.
  • Speaking of that, this is the only time I have ever used the string iterator in python. I figured that after getting burned by that so many times, I might as well get some use out of it for once!
  • Postgresql 8.0 added the "dollar-quoting" feature to mitigate quoting headaches when defining functions. Unfortunately our production server will be running 7.4 for the forseeable future, so I stuck with escaping my quotes in the second function the old-fashioned way.
  • The second function isn't plpython: only SQL and PL/PGSQL functions can call other database functions directly; everything else is restricted to their handler's namespace. Although plpython gives you a way to dip back into the database, the overhead of doing this isn't negligible, and for something simple like this it's easier to just declare an SQL function anyway.
  • If you don't have plpython installed in your master template, chances are you'll need to run createlang from the commandline (or CREATE LANGUAGE from psql) before trying this at home.

Comments

Unknown said…
Inferior, eh?

MySQL Reference Manual :: 27.2 Adding New Functions to MySQL

This has been around since the 4.0 series, so it's not exactly new.
Jonathan Ellis said…
If you find writing MySQL UDF in C/C++ user-friendly and convenient, then have at it. :)
Jonathan Ellis said…
... please feel free to rewrite the regsub I posted as a MySQL UDF as a counterpoint. I think that would be most instructive. (It's only fair to use Boost regex or whatever your weapon of choice is, since python has that built in.)
Tim Lesher said…
On the other hand, were you both a MySQL fanatic and a Python fanatic, I suppose you could attempt a SWIG or boost::python hack. But why?
Hamish said…
Might I suggest using meaningful names in the code rather than indexing args?:

pattern, original, replacement, flags = args
Jonathan Ellis said…
You're right, of course -- but how much urgency would fixing the plpython handler have then? :)
Anonymous said…
Some developers are leery because of a philosophy of separating code from data. There should be one place for procedures (python code, usually in a directory full of .py files) and one place for SQL commands (usually some other directory of .sql files, or *well-documented* places inside the .py files)

Mixing is Bad.
Jonathan Ellis said…
When you get more experienced, you prefer to have a separation along the lines of: data manipulations belongs in the db; app logic belongs in your external process.
Anonymous said…
That is so great!
It's also possible to run python in parallel on SMP: Parallel Python
Unknown said…
This is exactly what I was looking for! Thanks so much.

-Echo
http://www.echo.net84.net/
Anonymous said…
Rock on, Ellis. I've been evaluating web application frameworks for ages and am not happy with the way any of them separate the model, view, and controller.

IMHO, if the model involves a DBMS then all model related work should be handled by the DBMS.
Unknown said…
Great example.

While separating code from data is a good philosophy I have to say stuff like this is very relevant. You simply do not get to choose some of the projects you inherit and its never as straight forward as you want it to be.

If we all used SQL alchemy or the likes we certainly could separate our code and data very easily and perform all kinds of magic when we want.

Well done Mr Ellis.
Anonymous said…
This was EXACTLY what I needed for figuring out how to call a plpython function from another function.

Pretty great this is still a relevant post 7 years later!

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren