My current employer keeps a lot of application metadata as xml in the database, and I needed to make a simple update to some of that. Writing a client program to do this would be immense overkill, as well as not playing nicely with my database auto-upgrade script. The update could be easily handled with a regular expression substitution, but although postgresql has a decent regular expression implementation built-in, it has no function to do replacement with them. [Update: regex_replace was added in version 8.1, six months after this post.)
An underused feature of PostgreSQL is its ability to define functions in just about any language you like. Java, Perl, Python, TCL, among others, as well as PL/PGSQL. Most developers, unfortunately, are a bit leery of defining custom functions in the database. Whether this is because of the learning curve (which turns out to be quite shallow), or because they are too used to inferior databases that don't allow such things, I couldn't say.
Anyway, creating a function that did what I needed was trivial.
-- pattern, original, replacement, flags CREATE OR REPLACE FUNCTION regsub (VARCHAR, VARCHAR, VARCHAR, VARCHAR) RETURNS VARCHAR AS ' import re flags = 0 for flagchar in args[3]: flags |= getattr(re, flagchar) p = re.compile(args[0], flags) return re.sub(p, args[2], args[1]) ' LANGUAGE 'plpythonu' WITH (ISCACHABLE); -- pattern, original, replacement CREATE OR REPLACE FUNCTION regsub (VARCHAR, VARCHAR, VARCHAR) RETURNS VARCHAR AS ' select regsub($1, $2, $3, ''''); ' LANGUAGE 'sql' WITH (ISCACHABLE);Some things to note:
- Postgresql 8.0 supports giving function variables names in the function definition statement, but the plpython handler doesn't know about this. (Anyone want to fix this? I just don't have the time.) So I'm stuck with the old-style args list.
- Callers can't access the re namespace to grab the contants out as ints, so flags are passed as a concatenated string ('IS' = IGNORECASE + DOTALL, etc.) and turned into an appropriate int with getattr.
- Speaking of that, this is the only time I have ever used the string iterator in python. I figured that after getting burned by that so many times, I might as well get some use out of it for once!
- Postgresql 8.0 added the "dollar-quoting" feature to mitigate quoting headaches when defining functions. Unfortunately our production server will be running 7.4 for the forseeable future, so I stuck with escaping my quotes in the second function the old-fashioned way.
- The second function isn't plpython: only SQL and PL/PGSQL functions can call other database functions directly; everything else is restricted to their handler's namespace. Although plpython gives you a way to dip back into the database, the overhead of doing this isn't negligible, and for something simple like this it's easier to just declare an SQL function anyway.
- If you don't have plpython installed in your master template, chances are you'll need to run createlang from the commandline (or CREATE LANGUAGE from psql) before trying this at home.
Comments
MySQL Reference Manual :: 27.2 Adding New Functions to MySQL
This has been around since the 4.0 series, so it's not exactly new.
pattern, original, replacement, flags = args
Mixing is Bad.
It's also possible to run python in parallel on SMP: Parallel Python
-Echo
http://www.echo.net84.net/
IMHO, if the model involves a DBMS then all model related work should be handled by the DBMS.
While separating code from data is a good philosophy I have to say stuff like this is very relevant. You simply do not get to choose some of the projects you inherit and its never as straight forward as you want it to be.
If we all used SQL alchemy or the likes we certainly could separate our code and data very easily and perform all kinds of magic when we want.
Well done Mr Ellis.
Pretty great this is still a relevant post 7 years later!