Thursday, December 14, 2006

Walt Mossberg: "I prefer Mozy"

I don't blog about my day job at Mozy much, but I can't pass this one up: Walt Mossberg reviewed Mozy vs Carbonite in today's Wall Street Journal, and he concluded "of the two products, I prefer Mozy."

"The Walt effect" makes the Digg effect look like small potatoes.

Komodo 4 will not include gui builder out-of-the-box

As of Komodo 4.0 beta 2, Komodo no longer includes ActiveState's Tk-based GUI builder (with support for non-tcl Tk bindings like Python's Tkinter). They've released it as open source to the SpecTcl project, from which it was apparently forked long ago. Re-integration with Komodo 4 as a plugin is planned eventually, but it doesn't look likely before the 4.0 release.

I wonder how much of the impetus behind this is the increased amount of web-based development done today, and how much is due to Tk's increasingly dated look and the increased popularity of more modern toolkits such wxwidgets.

Wednesday, December 06, 2006

Wow, the gzip module kinda sucks

I needed to scan some pretty massive gzipped text files, so my first try was the obvious "for line in gzip.open(...)." This worked but seemed way slower than expected. So I wrote "pyzcat" as a test and ran it against a file with 100k lines:

#!/usr/bin/python

import sys, gzip

for fname in sys.argv[1:]:
  for line in gzip.open(fname):
      print line,

Results:

$ time zcat testzcat.gz > /dev/null
real    0m0.329s

$ time ./pyzcat testzcat.gz > /dev/null
real    0m3.792s

10x slower -- ouch! Well, if zcat is so much better, let's try using zcat to do the reads for us:

def gziplines(fname):
  from subprocess import Popen, PIPE
  f = Popen(['zcat', fname], stdout=PIPE)
  for line in f.stdout:
      yield line

for fname in sys.argv[1:]:
  for line in gziplines(fname):
      print line,

Results:

$ time ./pyzcat2 testzcat.gz |wc
real    0m0.750s

So, reading from a zcat subprocess is 5x faster than using the gzip module. cGzipFile anyone?

Monday, December 04, 2006

pysqlite design decisions

There's an interesting thread over on the pysqlite mailing list about pysqlite, shortcomings of the DBAPI 2, ASPW (an sqlite interface that does NOT attempt to conform to the DBAPI), and a working sqlite interface in 200 lines of python using ctypes. Worth checking out if you're interested in this sort of thing. (Alas, pipermail does a suckariffic job of threading the conversation, so browsing by date is probably your best bet if you haven't already subscribed to the list from a gmail account.)

Friday, December 01, 2006

SQLAlchemy at Pycon 2007

Mark Ramm will be giving a talk on SQLAlchemy. I'll be giving a talk on SqlSoup, the SQLAlchemy extension I wrote, as well as a tutorial on Advanced Databases with SQLAlchemy.

For my tutorial, I'll be targetting people who understand database fundamentals but want to learn about more advanced features like triggers and how an ORM like SQLAlchemy lets you take advantage of those. (Many ORM tools force you to give up the more powerful database features and pretend instead that your database is a dumb object store, which IMO defeats one of the main purposes of using a modern database.)

If you need to brush up on fundamentals first, Steve Holden is running a more basic tutorial on databases with Python earlier in the day. Here's his outline; his slides from pycon 06 on the same subject are also online.

If there's something you'd like to see covered in my talk or tutorial, comments are welcome by email (jonathan at utahpython dot org) or right here.

Benchmark: PostgreSQL beats the stuffing out of MySQL

This is interesting, because the conventional wisdom of idiots on slashdot continues to be "use postgresql if you need advanced features, but use mysql if all you care about is speed," despite all the head-to-head benchmarks I've seen by third parties showing PostgreSQL to be faster under load. (MySQL's own benchmarks, of course, tend to favor their own product. Go figure, huh.)

Here's the latest, showing postgresql about 50% faster than mysql across a variety of hardware. But where MySQL really takes a pounding is when you add multiple cores / CPUs: MySQL adds 37% performance going from 1 to 4 cores; postgresql adds 226%. Ouch! (This would also explain why MySQL sucks so hard on the Niagra chip on the requests-per-second graph -- Sun sacrificed GHz to get more cores in.)

As even low-end servers start to go multicore this is going to be increasingly important.

Update: PostgreSQL core member Josh Berkus says :

[This] is a validation of the last four years of PostgreSQL performance engineering. It's not done yet ... if the Tweakers.net test had included 16+ core machines you'd have seen PostgreSQL topping out ... but our hackers have done quite well.

Friday, November 17, 2006

Spyce 2.1.3

Another maintenance release.

Changelog:

   db module requires PK definition up-front (instead of erroring out later)
   default login render tags give better feedback on unsuccessful login
   raise NameError instead of string exception for invalid eval'd tag attributes
   fixes for spyceProject
   fix kwargs render-through on spy: list and table tags
   fix auto-selecting of multiple defaults in compound controls
   fix for class-based exceptions in old-style tag exception handlers

Wednesday, November 15, 2006

Translating Spyce form tags

Jeff Shell doesn't find Spyce tags easy to translate into "what HTML does this output?" That's my fault for writing crappy documentation, I guess, although I did think the examples helped a bit. :)

So, briefly, here's how you translate the Spyce form tags:

<f:text name=quest label="Question:" />
  • All the tags correspond closely with raw HTML. (This is part of Not Wasting Your Time.) If the HTML is "input type=foo," the corresponding Spyce tag is f:foo. So this will generate "input type=text."
  • Name attribute is the same as in HTML; by default Spyce also sets the ID attribute to the same as name, for convenience working with javascript. ID may of course be overridden separately.
  • The "Label" attribute generates a "label for=..." html tag pair. For most form elements, the label will be placed before the input; the exceptions are radio and checkbox buttons.
  • Spyce will add a "value" attribute for you if you don't specify one; this will be the value of the GET/PUT variable corresponding to the input name if there is one, or "" otherwise.

So, the HTML generated is

<label for="quest">Question:</label> <input type="text" name="quest" id="quest" value="">

I find the Spyce notation to be concise and intuitive, once you learn a few simple rules, but to each his own.

(I covered the more "advanced" stuff over here already. Guess I should have started with the basics!)

Thursday, November 02, 2006

The postgresql irc search bot

I saw something cool in #postgresql on freenode tonight. There is a bot in the channel named "rtfm_please;" he'll reply to messages with pages of the postgresql documentation on that subject. For instance,

(9:03:51 PM) jbellis: constraint
(9:03:51 PM) rtfm_please: For information about constraint
(9:03:51 PM) rtfm_please: see http://www.postgresql.org/docs/current/static/ddl-partitioning.html
(9:03:51 PM) rtfm_please: or http://www.postgresql.org/docs/current/static/infoschema-table-constraints.html
(9:03:51 PM) rtfm_please: or http://www.postgresql.org/docs/current/static/sql-set-constraints.html

Unfortunately it's not too bright:

(9:03:48 PM) jbellis: add constraint
(9:03:49 PM) rtfm_please: nothing found :-(

Apparently it's not really a search, but a manually-maintained collection of links (which explains why it's virtually instantaneous). I was amused to see this entry:

(9:23:36 PM) jbellis: mysql gotchas
(9:23:37 PM) rtfm_please: For information about mysql gotchas
(9:23:37 PM) rtfm_please: see http://sql-info.de/mysql/gotchas.html

(Yeah, I'm an irc newbie.)

Tuesday, October 31, 2006

Another serving of SqlSoup

Earlier this year I wrote an introduction to SqlSoup, the SQLAlchemy extension that leverages SQLAlchemy's excellent introspection, mapping, and sql construction to provide a database interface that is both simple and powerful.

Here's what SqlSoup has added since then (continuing with the books/loans/users example tables from pyweboff). Full SqlSoup documentation is on the SQLAlchemy wiki.

Set operations

The introduction covered updating and deleting rows that had been mapped to Python objects. You can also perform updates and deletes directly to the database.

>>> db.loans.insert(book_id=book_id, user_name=user.name)
MappedLoans(book_id=2,user_name='Bhargan Basepair',loan_date=None)
>>> db.flush()
>>> db.loans.delete(db.loans.c.book_id==2)

>>> db.loans.update(db.loans.c.book_id==2, book_id=1)
>>> db.loans.select_by(db.loans.c.book_id==1)
[MappedLoans(book_id=1,user_name='Joe Student',loan_date=datetime.datetime(2006, 7, 12, 0, 0))]

Joins

Occasionally, you will want to pull out a lot of data from related tables all at once. In this situation, it is far more efficient to have the database perform the necessary join. (Here we do not have "a lot of data," but hopefully the concept is still clear.) SQLAlchemy is smart enough to recognize that loans has a foreign key to users, and uses that as the join condition automatically.

>>> join1 = db.join(db.users, db.loans, isouter=True)
>>> join1.select_by(name='Joe Student')
[MappedJoin(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0,book_id=1,user_name='Joe Student',loan_date=datetime.datetime(2006, 7, 12, 0, 0))]

If you're unfortunate enough to be using MySQL with the default MyISAM storage engine, you'll have to specify the join condition manually, since MyISAM does not store foreign keys. Here's the same join again, with the join condition explicitly specified:

>>> db.join(db.users, db.loans, db.users.c.name==db.loans.c.user_name, isouter=True)
<class 'sqlalchemy.ext.sqlsoup.MappedJoin'>

You can compose arbitrarily complex joins by combining Join objects with tables or other joins. Here we combine our first join with the books table:

>>> join2 = db.join(join1, db.books)
>>> join2.select()
[MappedJoin(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0,book_id=1,user_name='Joe Student',loan_date=datetime.datetime(2006, 7, 12, 0, 0),id=1,title='Mustards I Have Known',published_year='1989',authors='Jones')]

If you join tables that have an identical column name, wrap your join with "with_labels", to disambiguate columns with their table name:

>>> db.with_labels(join1).c.keys()
['users_name', 'users_email', 'users_password', 'users_classname', 'users_admin', 'loans_book_id', 'loans_user_name', 'loans_loan_date']

Advanced mapping

SqlSoup can map any SQLAlchemy Selectable with the map method. Let's map a Select object that uses an aggregate function; we'll use the SQLAlchemy Table that SqlSoup introspected as the basis. (Since we're not mapping to a simple table or join, we need to tell SQLAlchemy how to find the "primary key," which just needs to be unique within the select, and not necessarily correspond to a "real" PK in the database.)

>>> from sqlalchemy import select, func
>>> b = db.books._table
>>> s = select([b.c.published_year, func.count('*').label('n')], from_obj=[b], group_by=[b.c.published_year])
>>> s = s.alias('years_with_count')
>>> years_with_count = db.map(s, primary_key=[s.c.published_year])
>>> years_with_count.select_by(published_year='1989')
[MappedBooks(published_year='1989',n=1)]

Obviously if we just wanted to get a list of counts associated with book years once, raw SQL is going to be less work. The advantage of mapping a Select is reusability, both standalone and in Joins. (And if you go to full SQLAlchemy, you can perform mappings like this directly to your object models.)

Thursday, October 26, 2006

Ruby isn't going to fracture, and "enterprise" is not synonymous with "static"

I don't follow Ruby development too closely (most of the info on it is still in Japanese, after all), but the US RubyConf was held recently so there's been an unusual number of English posts on Ruby, among them David Pollack's The Impending Ruby Fracture.

David's article seems to consist of these points:

  1. Matz is uninterested in adding static bondage & discipline features to Ruby (true, as far as I know)
  2. "Enterprise" users won't be satisfied without said features (more on this below)
  3. There are a lot of Ruby runtimes out there right now (the most interesting part of the article)
  4. Therefore some Enterprise will co-opt one of the runtimes to fork Ruby and add the B&D features (wtf?)

Summarized this way it looks faintly ridiculous, and yet nobody over on the programming reddit has called this out. Maybe I'm taking excessive liberties with David's article, but I don't think I am.

The possibility of forking is part of what makes open source wonderful. The actual cost of a fork is astronomically high; almost nobody has made it work. For every X.org there are dozens of failures and probably far more where the would-be forkers realized that however bad the situation was, actually forking would be worse.

Now, in the absence of strong leadership, what you can have happening to a language is de facto forking, like what you have with Lisp -- the Common Lisp standard is old, so the various Lisp implementations have gone their separate ways to various degrees and portability between them is pretty dicey. But the Ruby community seems to be pretty content with the job Matz is doing so I don't see this happening.

As a motivation to assume the huge costs of forking, David submits... "interfaces or some other optional typing mechanism?" Excuse me. Even though some intelligent language designers have flirted with ideas along those lines, that's not something that's going to get refugees from Java to rally around your banner for a fork.

I also have to take issue with David's characterization of this as features that "appeal to enterprise customers." While it may be true that B&D languages are currently popular with large corporations, other large corporations recognize the advantages of dynamic languages. Corporations aren't stupid; they're just very conservative, for the most part. In 10 years you'll see more Python and Ruby in the enterprise, just as Java and C# are replacing COBOL and C++ now.

Thursday, September 07, 2006

Codejam 2006 qualification round

I'm pleased that I made it into the round of 1000 in Google's Codejam. The Python support was much appreciated!

(Actually, I'd be curious what the breakdown was by language of the contestants and qualifiers. I have no real stats but from the 50+ submissions I glanced at, all C#, Java, and C++, I would guess that topcoder's "Python might be too slow" disclaimer scared a lot of people away from Python. Or, like VB.NET, it just isn't that popular with this group. The ignomy!)

I solved the 250 pt problem (problem set 5) very quickly, which is good because I didn't end up solving the 750 pt one at all. The 250 pt problem was

You are given a tuple (integer) f that describes a set of values from a function f. The x-th element (zero-indexed) is the value of f(x). The function is not convex at a particular x value if there exist y and z values such that y < x < z, and the point (x, f(x)) lies strictly above the line between points (y, f(y)) and (z, f(z)) in the Cartesian plane. All x, y, and z values must be between 0 and n-1, inclusive, where n is the number of elements in f. Return the number of places where f is not convex.

Here was my brute force solution:

class point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

def convex(x, f):
    fx = f[x]
    for y in xrange(x):
        fy = f[y]
        for z in xrange(x + 1, len(f)):
            fz = f[z]
            if above(point(x, fx), point(y, fy), point(z, fz)):
                return True
    return False

def above(x, y, z):
    slope = (z.y - y.y) / float(z.x - y.x)
    lineatx = y.y + slope * (x.x - y.x)
    return x.y > lineatx

class ConvexityIssues:
    def howMany(self, f):
        return sum(int(convex(x, f)) for x in xrange(len(f)))

The other problem was more interesting. A brute-force solution here is just too slow for a 16x16 matrix (i.e., 16! = 20922789888000 permutations) and I couldn't figure out the pattern in time to construct a better algorithm.

The permanent of a nxn matrix A is equal to the sum of A[1][p[1]] * A[2][p[2]] * ... * A[n][p[n]] over all permutations p of the set {1, 2, ... , n}.

You will be given a tuple (string) matrix, where each element contains a single space delimited list of integers. The jth integer in the ith element represents the value of A[i][j]. Return the int represented by the last four digits of the permanent of the given matrix.

There's also a discussion thread about this question for the curious.

This is probably as far as my CodeJam career goes since I don't really have time to practice for the next round, and this part of my brain is clearly rusty. Still, I enjoy it, especially in Python. Thanks, Google!

Wednesday, August 30, 2006

Spyce will not waste your time: authentication

If you work in the web development area, or even dabble in it as a hobbyist, sooner or later you're going to write code for a project that needs authentication. Probably sooner than later.

For a feature that gets used so frequently, it's remarkable to me that nobody has really done this right. Here are some basic principles for a good solution:

  1. A minimum of customization to work out-of-the-box
  2. Gentle complexity slope when more sophisticated behavior is needed
  3. Play nice with others
  4. Don't try to solve world hunger

The first two are, I hope, no-brainers. The second two bear more explanation.

Play nice with others: not everyone wants to authenticate against a Users table in a relational database. (Fairly common alternatives are LDAP or Unix logins.) If you bake in assumptions like this too deep, it causes problems. It might be worth the problems if it were impossible to provide both generality and ease of use, but such is not the case.

Don't try to solve world hunger. By this I mean, don't invent a Grand Unified Authorization/Permissions scheme with Groups and Subgroups and Roles and ACLs that supposedly can handle any needed authorization requirements, but requires a week of studying your system to make it work. There have been several attempts by very smart people to make this work and it hasn't, yet. So unless you are a grad student or in some other position that leaves you with too much time on your hands, don't go there. Instead, make it easy for the application developer to implement whatever he needs on top of your authentication code.

With that introduction, here's what you need to do to enable authentication for your Spyce app:

  1. Invoke the spy:login (or spy:login_required) tag in the pages you wish to restrict to authenticated users
  2. Write a function that, given a username / password combination, returns a pickle-able identifier, or None; make this function the login_defaultvalidator in your Spyce config file. Like this one, used in the pyweboff demo (which does, as it happens, store user information in a database):
    def validator(login, password):
       user = db.users.selectone_by(name=login, password=password)
       if user:
           return user.id
       return None
    login_defaultvalidator = validator
    

That's it! Out of the box, Spyce handles the cookies, the login forms and handlers, all of the details you shouldn't have to worry about! The form rendering is done by active tags, which we talked about yesterday, so it's easy to customize when you get to that point. And if you want to restrict some pages to a subset of all users, you'd just write a validation function expressing those constraints, and pass that function to spy:login on those pages.

Here's a live example. (Login as spyce/spyce.)

If you'd like to compare this approach with other frameworks, here are some relevant links:

  • Ruby on Rails: "Don't start here until you 'get' it. Most of the authentication guides are broken or poorly documented." Good luck! Or maybe you'd like to buy our book!
  • Turbogears: (updated link) there is a lot to read here but Karl assures me that only a couple steps are actually necessary, besides the database setup.
  • Django: you have to do a lot of reading to find out what the real minimum work to do is, but I count 5 steps in 3 files.
  • I'll throw in ASP.NET for good measure: it's actually not too bad, but you're still on your own writing the login form and handler.

Tomorrow I'll talk about the Spyce db module, shown briefly above in the sample validation function. (And also mentioned in passing when I talked about Spyce handlers.)

Tuesday, August 29, 2006

Spyce will not waste your time: code reuse

Most frameworks today have pretty good support for code/markup reuse towards providing a common look to a site without the "include a header, include a footer" clunkers you used to see. (Spyce uses "parent tags," which are as elegant as any and more powerful than most.)

But what I want to talk about today is code/markup reuse in the small, at a level that corresponds to functions or methods in Python.

Most frameworks today still suck at this. Approaches vary, but the important thing they have in common is not letting you use the same techniques you use in the rest of the framework as well as generally being clunky.

For instance, with Rails, you define functions which you can then use from your views/templates/presentation layer, but within those functions you're back in 1996, stringing HTML together manually. Something like

def faq_html(question, answers)
    html = '''
    <table class="faq">
    <thead>
      <tr><th>Q.</th><th>%s</th></tr>
    </thead>
    ''' % question
    for a in answers:
        html += '<tr><td>A.</td><td>%s</td></tr>' % a
    html += '</table>'
    return html

(Edit: Rails also has a code reuse mechanism called "partials," which behave a lot more like normal rhtml code.)

In Spyce, you can still do things this old-fashioned way if you really want. But a much better way is defining "active tags," which take parameters like functions but are defined with the same tools you build the rest of your page with, i.e., Python and other tags. Such as:

[[.begin name=html singleton=True ]]
[[.attr name=question ]]
[[.attr name=answers ]]

<spy:table class="faq" data="(('A.', a) for a in answers)">
<thead>
  <tr><th>Q.</th><th>[[= question ]]</th></tr>
</thead>
</spy:table>

[[.end ]]

In a tag library named "faq", this active tag could then be used as follows: (the "=" sign means, "evaluate this parameter as a Python expression instead of as a string)

<faq:html question="=question.body" answers="=[a.body for a in answers]" />

The more complex your code becomes, the more you'll appreciate the Spyce way. You'll find that with the increased programmer-friendliness active tags provide, you'll abstract more common code than you would have with clumsier tools, with the all the improved maintainability that entails over code-by-copy-and-paste.

But what's really cool is you can encapsulate handlers with your tags, creating self-contained bundles of render and control logic. This is crucial: there's no way to do this with the traditional "helper function" approach. Splitting functionality out into active tag components like this allows both reuse and encapsulation, just as classes provide similar benefits in pure Python code.

(We talked about Spyce handlers yesterday. Briefly: Spyce allows you to attach a python function to a form submit action.)

To continue our FAQ example, let's add a "Create a new FAQ" handler, like this:

[[!
def faq_new(self, api, question, answer):
    q = api.db.questions.insert(body=question)
    api.db.answers.insert(question_id=q.id, body=answer)
    api.db.flush()
]]

[[.begin name=faq_html singleton=True ]]
[[.attr name=question ]]
[[.attr name=answers ]]

<spy:table class="faq" data="(('A.', a) for a in answers)">
<thead>
  <tr><th>Q.</th><th>[[= question ]]</th></tr>
</thead>
</spy:table>

<h2>New FAQ</h2>
<div><f:text name=question label="Question:" /></div>
<div><f:text name=answer label="Answer:" /></div>
<div><f:submit handler=self.faq_new value="Create" /></div>
[[.end ]]

Spyce provides friendlier and more powerful tools for code reuse than either other view-oriented frameworks or MVC frameworks. Give it a try; you won't want to go back.

Tomorrow I'll explain the Spyce authentication system.

Monday, August 28, 2006

Spyce will not waste your time: controllers/handlers

Traditional view-oriented frameworks (such as PHP, or Spyce 1.3) do not handle control logic well. Today I'll show how Spyce 2 solves this problem with "active handlers," and tomorrow I'll show how this lets Spyce provide much more elegant code-reuse tools than either other view-oriented frameworks or MVC frameworks like Turbogears and Django.
Control logic is the code that decides what happens next in response to a user action. Say we have a simple form to create new to-do lists, like this:
<form>
<input type=text name=name value="">
<input type=submit>
</form>
In a traditional view-oriented framework (like Spyce 1.3), you would process this form with code that looks something like this:
[[name = request['name']
db.todo_lists.insert(name=name)
api.db.flush()
]]
That wasn't too bad. (Especially since I cheated and used the modern Spyce db api, AKA SqlSoup from SQLAlchemy.) But even with a simple, one-element form like this, a couple things are clear:
  • code like "name = request['name']" must be written for each form element. Irritating!
  • putting this chunk of code in the page with the form on it sucks (because you have to "if" it out if you're not actually processing the form POST)
  • putting this code on another page that doesn't actually do anything else sucks too (because then you have a view-that's-not-really-a-view, which is ugly -- to say nothing of existentially disturbing)
The Spyce solution is to allow you to specify Python functions, called handlers, that run on form submission. So your submit will now look like
<f:submit handler=list_new />
(note the Spyce form submit active tag; the vanilla html submit tag won't understand the handler parameter, of course), and you can define your handler as
def list_new(api, name):
    api.db.todo_lists.insert(name=name)
    api.db.flush()
This takes care of the redundant code to access form variables -- Spyce inspects the function definition and pulls the corresponding values out of the request for you -- and it also solves the problem of how to organize your code.
Earlier, I said you could write "handler=list_new." Actually, you either need to write "handler=self.list_new" and define list_new in a class chunk in the current page (as demonstrated in this example), or you can write "handler=somemodule.list_new," and put list_new in somemodule. (In the spirit of Not Wasting Your Time, Spyce will take care of importing somemodule as necessary.)
Pretty simple stuff, but I needed to explain handlers before I get into how Spyce uses them with active tags to provide unique code re-use tools. That will be our topic next time! Until then, if you missed the entry on form processing, go read it, since it touches on some of the more advanced features of handlers.







Friday, August 25, 2006

Spyce will not waste your time: form processing

The revamed Spyce website announces "Spyce will not waste your time."

Most web frameworks today waste your time with busywork at least some of the time. This is unacceptable; boilerplate code is tedious to write, an obstacle to good maintenance, and a distraction from real productivity. All the development of Spyce 2.0 and 2.1 has been towards eliminating common sources of busywork in web development.

Today I'll give a couple concrete examples of how this applies to creating forms for user input.

Let's say you want to have a select box that remembers what the user's last selection was. Pretty much any form these days needs logic like this when giving feedback inline. Many frameworks make you write this code out by hand; if you've ever developed in one of these, you know how quickly this gets old.

Here's what you'd have to write in Spyce 2.1, given a list of options named options:

<f:select name="foo" data="options" />

That's it.

For comparison, here's what this would have looked like in Spyce version 1.2, some time in 2002. Many frameworks still make you do it much the same way today:

<select name="foo">
  [[ for name, value in options:{ ]]
     <option value="[[= value ]]"[[ if value == request['foo']:{ ]] selected[[ } ]]>[[= name ]]</option>
  [[ } ]]
</select>

Going back to the modern Spyce: notice the data attribute? All Spyce tags that you would expect to use a for loop with, now take a data attribute that can be any Python iterable. You can still render the contents of a select or a ul or a table manually, but Spyce saves you time for the common case.

Spyce also takes the boilerplate out of giving feedback to users. I'm unaware of any other framework that makes this as easy as Spyce; perhaps the closest is ASP.NET, but even that seems pretty clunky to me after using Spyce for a while.

All you need to do is raise a HandlerError from your handler, and Spyce knows how to render it with the associated form with no further action on your part:

    raise HandlerError('choice', 'None selected')

The rendering of errors is easily customized, and you can raise CompoundHandlerError if there's more than one problem.

There's a live example over here; click on "Run this code" and see what it does when you ask for the square root of a negative number.

I'll leave comparing this with the hoops your current framework makes you jump through as an exercise for the reader.

Thursday, August 24, 2006

Spyce 2.1.2 released

Just some bug fixes this time:

  • fix support for threadless python builds
  • fix using compiled tags within other tags
  • fix f:textarea
  • fix spyceUtil.extractValue for incomplete dict work-alikes
  • fix session1 backwards compatibility

Amazon EC2: How much less userfriendly can you make it?

This morning I read about the new Amazone Elastic Compute Cloud service. It's basically a cluster of Xen VPSes, done right.

At least that's what I thought until I actually tried to use it.

Let's see:

  1. Sign up for AWS
  2. Sign up for S3
  3. Create certs
  4. Download tools
  5. Export 3 EC2 environment variables
  6. Oh, hell. Tools are in java. Switch to windows box since I don't have the patience to figure out installing Java 1.5 on debian right now.
  7. Repeat steps 4-5
  8. Export JAVA_HOME
  9. Run ec2-describe-images
  10. Exception in thread "main" java.lang.NoClassDefFoundError: com/amazon/aes/webservices/client/cmd/DescribeImages

Looks like I get to try to fix the classpath for them. How retro-cool can you get? It's just like 1999!

Bah. Next time, use Python, guys.

update:: Apparently they didn't even bother testing on windows and their script was just plain broken. Way to go.

Monday, August 14, 2006

Hell yes: Google codejam does Python

For the first time, codejam 2006 features Python as a language option.

I don't follow topcoder closely enough to know if this is the first contest they've done with Python or not, but this smells of Google influence to me.

I was about resigned to not bothering with this year's code jam, with my C# even rustier than last year, but now I'm totally stoked. Go Python!

(Oh yeah, link to main code jam page.)

Update: for the curious, topcoder added Python support about 10 days ago

Sunday, August 13, 2006

Spyce 2.1 released

Check out What's new:

  • Login tags
  • SQLAlchemy integration
  • Validation in handlers
  • More powerful form tags
  • Improved handler integration

Pauli Virtanen has contributed debian scripts, so we'll have a .deb to go with the rpm and sourceball releases pretty soon.

Thursday, August 10, 2006

Postgresql: don't rely on autovacuum

I have a pg database that just suffered through a multiple-hour vacuum of one of its tables. It was pretty painful. Autovacuum is configured to run every 12 hours, but for whatever reason it didn't see fit to vacuum this very busy table for substantially longer than that. (Not sure exactly how long. Over a week is my guess.)

The problem is that one size does not fit all. On smaller tables, the default autovacuum_vacuum_scale_factor of 0.4 is just fine. On this table waiting that long is unacceptable.

Monday, July 31, 2006

Sqlite sucks

I'm losing patience with sqlite. I've been working on Spyce examples using postgresql, but now that I'm getting close to releasing Spyce 2.1, I figured I'd better convert the examples to use sqlite since that's such a no-brainer to set up.

It has been a frustrating experience.

Weird-assness I've run into includes

And I didn't think I was doing anything very complicated! My examples have three tables at most!

Really my overall impression is one of a "0.9" product at best. I'm amazed that so many people appear to use this festering pile of gotchas in production.

Wednesday, July 12, 2006

"Single column primary keys should be enough for anybody"

Apparently PragDave had the temerity to suggest at RailsConf that Rails could stand some improvement in some areas, such as supporting composite primary keys in ActiveRecord. Naturally, the first reaction of a huge Rails fan like Martin Fowler is to get to work figuring out how to implement this.

Whoops, sorry, no, that would be in some alternate universe where fanboyism isn't the most important technical prinicple for some people. Martin's real reaction was to write a rebuttal, the gist of which is, if Rails doesn't already support it, it can't be important.

It's sad when someone who's done some good work puts on the fanboy blinders. You used to get the same schtick from mysql.com back in the 3.x days. Remember the rants in their docs about how foreign keys and transctions were for wimps and real programmers didn't want them anyway?

Of course, today even mysql corporate admits that these are important features, even though one can be forgiven for having a sneaking feeling that they don't really understand exactly why some people make such a big deal over it. Presumably once Rails supports composite primary keys we will hear Martin singing a different tune. One can only hope it won't be the sort of half-assed bullet point it is for MySQL (where you can have transactions and FKs and so forth, all right, as long as you don't use the default table type).

Anyway. Josh Berkus (a member of the postgresql core team) recently wrote a thorough, three-part series on exactly this subject (composite primary keys, if you recall):

If you're a typical I-prefer-to-not-think-about-the-database-until-something-goes-wrong developer, this series touches on a lot of issues that will really make you think.

Oh, and if you couldn't guess what Josh thinks of the "just use a surrogate [integer] key" approach...

The surrogate numeric key has been a necessary evil for as long as we've had SQL. It was set into SQL89 because the new SQL databases had to interact with older applications which expected "row numbers." It's a bit of legacy database thinking that, according to a conversation with Joe Celko, E.F. Codd regretted allowing to creep into the SQL standard.

Inevitably, practices which are "necessary evils" tend to become "pervasive evils" in the hands of the untrained and the lazy. Not realizing that ID columns are a pragmatic compromise with application performance, application developers with an shallow grounding in RDBMSes are enshrining numeric IDs as part of the logical and theoretical model of their applications. Worse yet, even RDBMS book authors are instructing their readers to "always include an ID column," suturing this misunderstanding into the body of industry knowledge like a badly wired cybernetic implant.

Some eye-opening stuff there.

(And I don't want this to be a my-ORM-is-better-than-yours post -- really! -- but I know somebody will ask, so yes, SQLAlchemy does support composite primary keys.)

Tuesday, July 11, 2006

On popularity

Andrew Smith pointed out that according to Indeed.com, Python is about a factor of 3 times more popular than Ruby and is maintaining that lead as both graphs trend upwards.

I'd like to add just a couple things that I noticed.

One is that, like Django, Rails is a term with multiple meanings, and the Ruby framework only accounts for a small fraction of jobs that Indeed pulls up for that term. (I'm impressed that Indeed allows you to nest arbitrarily complex boolean expressions here...)

Another is that although Python looks pretty popular vs Ruby or Lisp, it's a good thing that popularity doesn't really reflect how good a language is, because ye olde statically compiled languages are still seven to twenty times more popular than python. Even PHP and Perl are more popular. (Although the trend on Perl is definitely down-sloping, for which we can all give thanks. C++ also has a noticable downward trend.)

Tuesday, June 27, 2006

Time to deprecate psycopg1

I wrote a relatively simple multithreaded script to automate some cleanup work in my database. I used psycopg1, because it was conveniently packaged for the version of debian the server had. (And also because psycopg2's bundled pooling mechanism kind of sucks.)

My script ran for a couple minutes, and segfaulted. I upgraded to the latest version of psycopg1, to no avail. You'd think that after 20+ "stable" releases this wouldn't be a problem anymore. Sigh.

I ran it in gdb to see where it was segfaulting, and sure enough psycopg was dereferencing a null pointer. Unfortunately it was far from obvious how to fix the problem, at least to someone unfamiliar with the code.

I bit the bullet and upgraded to psycopg2, which apparently got its first non-beta release earlier this month. For less-sucky pooling I used sqlalchemy's pool module.

No more segfaults.

Monday, June 12, 2006

Updating unique columns

Greg Mullane has an excellent post on updating unique columns. A simple problem, but one that can be troublesome in practice:
[T]here is one circumstance when [unique constraints] can be a real pain: swapping the values of the rows. In other words, a transaction in which the column values start unique, and end unique, but may not be so in the middle.
Read his article -- I wouldn't have thought of his "reversing the polarity" method. Clever! But my first thought when I read this was, "Aha, Greg missed one." Surely the easiest way is to simply create a deferrable constraint (where you can elect to have the constraint only checked at the end of the transaction, instead of at the end of each statement)! So I gave it a try:
=> CREATE TABLE foo (
     i              int
         CONSTRAINT foo_pk PRIMARY KEY DEFERRABLE
   );

ERROR:  misplaced DEFERRABLE clause
At first I thought this indicated a syntax error, but my syntax was correct. After some googling, it turns out that this is the rather unhelpful message that means, as the docs explain,
Only foreign key constraints currently accept this clause. All other constraint types are not deferrable.
Huh. Sure enough, all the places I'd used DEFERRABLE in the past were with FK constraints, so I never noticed this limitation. (Fixing this is on the TODO list for future action.)

Saturday, May 27, 2006

SQLAlchemy world domination tour

Python database tools have tended to suffer from the the 80% problem. (Open-source hackers tend to come up with solutions that solve 80% of a problem. Then someone else comes along and covers a different 80% of the same problem. And so on, so you end up with different solutions that attack the same problem, none of which are general enough for others to build on.) SQLAlchemy is making this a thing of the past, thanks to Mike Bayer's hard work. And, increasingly, others.

SQLAlchemy made its second major release today, the big zero-dot-two-oh. (Mike is conservative with version numbers; most projects would call this 0.9 if not 1.0.) There's also a migration guide for porting 0.1-based code.

SQLAlchemy lives up to its billing as The Python SQL Toolkit and Object Relational Mapper. (Emphasis mine.) This is possible is because of the extensive under-the-hood effort Mike has expended keeping dependencies to a minimum; you really can build on its functionality at any level -- not just the high-level ORM.

Some projects building on SQLAlchemy:

This is already more than older ORM tools have gathered in years! Not bad for a project whose first release was less than four months ago. SQLAlchemy is filling a real need for many people.

Thursday, May 18, 2006

The unveiling of Noodle

My friend Paul introduced Noodle at the Utah python user group a week ago. (Yeah, sorry about the not-exactly-breaking-news.) Paul's a Lisp expert -- I think from before he was a Python expert, but I'm actually not sure of the chronology there -- and he wrote Noodle to create a pythonic Lisp dialect. Noodle combines Lisp syntax and features like macros with Python-ish syntax for lists, dicts, and tuples, and compiles to Python bytecode so it can easily leverage all the Python libraries.

This bears repeating: there are a lot of projects out there that try to produce Java bytecode or CIL, but Noodle is the first I've heard of that produces Python bytecode. Pretty cool, if you ask me.

His slides are linked from his blog, but basically his conclusion so far is that it turns out to be harder to integrate python-style syntax into Lisp than he'd hoped. Not hard as in implementation, but hard as in making it non-clunky to use. The warts are small small individually but large enough in the aggregate that Paul says he prefers Python (or CL) currently, but reserves the right to change his mind as Noodle improves.

Friday, April 14, 2006

PyGame at the utah python user group

I presented on PyGame at the utah python user group last night. (When I don't get someone else lined up to speak in advance, I end up doing it myself. You'd think that would be enough motivation to not procrastinate.)

I had a lot of fun preparing this. I'd never used PyGame before, but as a teenager I spent a lot of time in the same space. (Anyone remember YakIcons?) So the general concepts were familiar to me, and I was pleasantly surprised by how good a job PyGame did at making things easy for me.

Here are my pygame slides, and my pyquest game skeleton is here.

(PyQuest is of course inspired by Crystal Quest -- the mac game, not the XB 360 remake -- and the graphics and sound files are from Solar Wolf, which I guess makes PyQuest LGPL. This caused Paul Cannon some serious mental trauma at the meeting, seeing and hearing solar-wolf-and-yet-not-solar-wolf.)

Monday, April 10, 2006

Introducing SqlSoup

[Update Oct 2006: here is another serving of SqlSoup. Update 2: SqlSoup documentation is now part of the SQLAlchemy wiki.]
Ian Bicking wrote in Towards PHP that a successful Python PHP-killer (as Spyce aspires to be) will need to include a simple data-access tool that Just Works.
I had two thoughts:
  1. He's absolutely right
  2. I could do this with SqlAlchemy in an afternoon
My afternoons are in short supply these days, and it took two of them, counting the documentation. But it's live now, as sqlalchemy.ext.sqlsoup. (The 0.1.5 release includes a docless version of sqlsoup; I recommend the subversion code until Mike rolls a new official release.)
SqlSoup is NOT an ORM per se, although naturally it plays nicely with SqlAlchemy. SqlSoup inspects your database and reflects its contents with no other work required; in particular, no model definitions are necessary.
Here's what SqlSoup looks like, given pyweboff-ish tables of users, books, and loans (SQL to set up this example is included in the test code, but I won't repeat it here):
>>> from sqlalchemy.ext.sqlsoup import SqlSoup
>>> db = SqlSoup('sqlite:///:memory:')

>>> users = db.users.select()
>>> users.sort()
>>> users
[Class_Users(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1),
Class_Users(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]
Of course, letting the database do the sort is better (".c" is short for ".columns"):
>>> db.users.select(order_by=[db.users.c.name])
Field access is intuitive:
>>> users[0].email
u'basepair@example.edu'
Of course, you don't want to load all users very often. Let's add a WHERE clause. Let's also switch the order_by to DESC while we're at it.
>>> from sqlalchemy import or_, and_, desc
>>> where = or_(db.users.c.name=='Bhargan Basepair', db.users.c.email=='student@example.edu')
>>> db.users.select(where, order_by=[desc(db.users.c.name)])
[MappedUsers(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0), MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1)]
You can also use the select...by methods if you're querying on a single column. This allows using keyword arguments as column names:
>>> db.users.selectone_by(name='Bhargan Basepair')
MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1)
All the SqlAlchemy mapper select variants (select, select_by, selectone, selectone_by, selectfirst, selectfirst_by) are available. See the SqlAlchemy documentation (sql construction and data mapping) for details.
Modifying objects is also intuitive:
>>> user = _
>>> user.email = 'basepair+nospam@example.edu'
>>> db.flush()
(SqlSoup leverages the sophisticated SqlAlchemy unit-of-work code, so multiple updates to a single object will be turned into a single UPDATE statement when you flush.)
To finish covering the basics, let's insert a new loan, then delete it:
>>> book_id = db.books.selectfirst(db.books.c.title=='Regional Variation in Moss').id
>>> db.loans.insert(book_id=book_id, user_name=user.name)
MappedLoans(book_id=2,user_name='Bhargan Basepair',loan_date=None)
>>> db.flush()

>>> loan = db.loans.selectone_by(book_id=2, user_name='Bhargan Basepair')
>>> db.delete(loan)
>>> db.flush()






Saturday, April 08, 2006

Database Replication

I spent some time yesterday researching (free) database replication options. Judging from the newsgroup posts I saw, there's a lot of confusion out there. The most common use case appears to be failover, i.e., you want to minimize downtime in the face of software or hardware failure by replicating your data across multiple machines. But, the most commonly-used options are completely inappropriate for this purpose.

As Josh Berkus explained, there are two "dimensions" to replication: synchronous vs async, and master/slave vs multimaster.

For a failover solution, if you want database B to take over from database A in case of failure, with no data loss, only synchronous solutions make sense. By definition, asynchronous replication means that database A can commit a transaction before those changes are also committed to database B. If A happens to fail between commit and replication, you've lost data. If that's not acceptable for you, then neither is async replication.

Be aware that the most popular replication solutions for both PostgreSQL and MySQL are asynchronous.

  • In part because of the contributions by the likes of Fujitsu and Affilas (.org and .info registrar), Slony-I is the most high-profile replication solution for PostgreSQL. Slony-I provides only asynchronous replication.
  • MySQL replication is also asynchronous.

So what are the options for synchronous replication?

  • MySQL "clustering" appears to allow for synchronous replication, but requires use of the separate NDB storage engine, which has a long list of limitations vs MyISAM or InnoDB. (No foreign key support, no triggers, basically none of the features MySQL has been adding for the past few years. Oh, and you need enough RAM to hold your entire database twice over.)
  • PgCluster for postgresql seems fairly mature, but the 1.3 (8.0-based) and 1.5 (8.1-based) versions still aren't out of beta. PgCluster also patches the postgresql source directly, which makes me a little nervous.
  • Another option is something like pgpool, which multiplexes updates across multiple databases. The biggest limitation of this approach is that you're on your own for recovery, i.e., after A goes down and you switch to B alone, how do you get A back in sync? A fairly common approach is to combine pgpool with Slony-I async replication for recovery.

The bottom line is, high availability isn't as simple as adding whatever "replication" solution you first run across. You need to understand what the different kinds of replication are, and which are appropriate to your specific situation.

Wednesday, March 15, 2006

Web framework notes

For our March meeting, the Utah python user group had multiple people present a solution to the PyWebOff challenge.

This is not an attempt at an authoritative web frameworks review. The backgrounds of the presenters are too different -- in particular, only the Rails and Spyce presenters had prior prior experience with the frameworks they used.

That said -- and obviously I'm all sorts of biased as both presenter and maintainer of Spyce -- I think Spyce came off looking pretty well. Partly because it was designed to automate repetitive tasks with relatively little "magic," and partly because Spyce doesn't try to put you in a straitjacket: you can write strict MVC code if you want, but if mixing some code into your view makes more sense than the alternatives, you can do that too.

"I," "me," etc. in the remainder of this post refers to the presenter speaking of himself. In presentation order, here are my notes:

TurboGears

Presenter: Byron Clark (Prior web experience: some playing with Rails)

  • sum of parts (kid, cherrypy, etc)
  • 0.89 reviewed
  • tg-admin shell
  • decorators describe template in use on methods
  • can load template in browser; easier for designers?
    • designers may have trouble with the py.content stuff
  • server crashes on python syntax error in controller
  • turbogears.flash -- validation errors, or debugging
  • cherrypy behavior changed: in cherrypy you usually return the page you want to render. in TG, you return a dict of stuff to expose to your template. (But if you return a string instead of dict, it just gets "printed." This is also useful for debugging.)
  • "kid wasn't completely worthless. (laughter) It could have been a lot worse. (more laughter) It gave me less trouble than everything else."
  • lots of validator methods, but nothing to turn list of strings into list of ints (say, for a bunch of checkboxes)
  • things that were hard:
    • getting validators to work
    • "would it kill the sqlobject developers to add a delete method?"

Zope 3

Presenter: Shane Hathaway (Prior web experience: extensive Zope 2)

  • Zope 3: learn from maintainability problems of Zope 2
    • target: python programmers
    • Zope 2 target: web designers
  • getting started: interface for content object
  • "here's the implementation of the book interface. it's not finished."
  • many many lines of xml config file. "I wrote it by hand."
    • forms generated from xml -- no manual html needed
    • student
      • "this is where I got stuck"
      • Zope 3 book talks about annotating objects; started to use that, "but zope 3.2 eliminated the code that made that work."
      • hours and hours
    • "I got really, really annoyed at the violation of DRY here"
  • Q: is zope3 configure-through-web, or is it xml now? A: I really don't know; I've been discussing that with (inaudible). Seems to let you do both, but I can't tell which they're emphasizing. Fredrik Lundh's crisis of faith link. "That's kind of where I am, too." Believe Zope3 will turn out better for larger projects.

Ruby on Rails

Presenter: Lee Never-got-his-last-name (Prior web experience: several months? of professional Rails)

PyWebOff source: http://utahpython.org/data/pyweboff-rails.tgz

  • breakpoint method throws test webserver into console mode
  • authentication -- missed this while taking a phone call
  • ajax basics: server sends back a script fragment that gets eval'd to replace elements on page
  • Q: I've heard Rails is inefficient. Can you comment? A: I've found it to be rather efficient -- my code is about 100 loc. I think my time is more valuable than server processing time. We serve 50 req/s on some pages on dual cpu, 1GB ram servers.
  • Q: FastCGI? A: Yes, with lighttpd
  • Q: How much does the ORM force you to stick to it? A: You pretty much need to stick to ActiveRecord, but you can drop down to SQL if you really need to. I like AR a lot.

Django

Presenter: Roberto Mello (Prior web experience: co-founder of OpenACS framework)

  • "spent way too much time trying to do things that I thought should have been a lot simpler"
  • url mapping w/ regexp; first argument is namespace-ish thing
  • admin was pretty cool
  • "Way too much magic! Way too much! At least for my taste."
    • When stuff fails, like my user registration, I have no idea what's going on
    • Talking w/ people on #django I guess it just takes a different frame of mind -- "what's a join?"
    • I'm an old school database guy, so I wanted to create a join on all my tables that would give me what I wanted, but I couldn't figure out a way to do that!
      • No way to specify method arguments from template, some people suggested looping over ALL rows in DB, basically performing a WHERE clause with if statements
      • others suggested making custom tag
      • found how to get cursor, but had to dig through source to figure out how to use it
      • finally settled on performing ORM lookups in controller, passing to template
  • also have to pass dict of stuff to template
  • ORM ugly and hard to get used to. Some syntax:
    • books.get_list(id__exact='5')
    • books.get_list(loans__loan_id__startswith='tom')
    • books.get_list(tables=['loans'], where=['foo=bar'])
    • very intimidating example of 20+ lines
  • I've only had a day of experience with django now, but it takes a very different frame of mine -- a very inefficient one -- to use it. But the admin's pretty cool.
  • @login_required decorator
  • Conclusion: can't get past the ORM. Way too much magic. Maybe "magic removal" branch will help. It seems to have just accumulated the magic, more and more.

Spyce

Presenter: Jonathan Ellis

Slides: http://utahpython.org/data/pyweboff-spyce.pdf

PyWebOff source: http://utahpython.org/data/pyweboff-spyce.tar.gz

  • Spyce 1.x: mostly JSP influence
  • Spyce 2.x: some ASP.NET influence. Goals:
    • less repetitive tasks for form processing
    • provide reusable components w/o the complexity that comes from trying to pretending HTTP is stateful (which is ASP.NET's big mistake)
    • other modern facilities such as parent/child composition (sort of like what others call template inheritance)
  • Challenge solution
    • model-less data access via sqlalchemy introspection ("simpledb" module, hopefully soon to make it into sqlalchemy.ext subversion)
    • "data binding" (ASP.NET term) in select box; no need to manually specify option tags
    • bobo (early zope)-inspired :list:int naming convention (see checkboxes)
    • (these are Spyce 2.1 features, currently in svn trunk)
    • auth tag (from contrib/; not easily customizable yet)
  • Q: What is the difference between [[! and [[\ ? A: [[! compiles the code fragement into the class that spyce generates; the other just inserts it as part of the "run" method body in that same class. [[! is primarily used for in-lined handlers, since non-class-level methods aren't in scope yet when handlers are processed.
  • Q: Is it really compiled? A: Yes; you can invoke spyceCompile.py from the command line to see what kind of code it generates.
  • Q: Do you have to do that manually? A: No, spyce compiles .spy files as-needed, and (by default) checks for changes with each request. You can turn this off for speed in a production environment, in which case you just restart spyce if you change things.

Sunday, March 05, 2006

Mike Orr's pycon 2006 summary

Mike Orr's pycon 2006 writeup is out in the March Linux Gazette. The usual suspects are present (keynote summaries, etc) as well as some lightning talk info that I haven't seen blogged elsewhere. (I loved Chad Whitacre's Testosterone screencast: "The Manly Python Testing Interface.")

Tuesday, February 28, 2006

PyCon Python IDE review

I presented an IDE review at PyCon last Friday. It was basically a re-review of what I thought were the 3 most promising IDEs from the Utah Python User Group IDE review, to which I added SPE, which was by far the most popular of the ones we left out that time. The versions reviewed are:

I'd intended to base my presentation around a comparison of writing a smallish program in each of the IDEs, but the more I tried to make this not suck, the more I realized it was a losing proposition. Instead, I decided to try to focus on the features in each that most set them apart from the others (both positive and negative); this seemed more likely be useful.

(I did a new feature matrix for this review, which is included after my comments. The slides I used are also up, at http://utahpython.org/jellis/pycon-ides.pdf, but aren't very useful absent video of the presentation itself. Hence this post.)

PyDev

PyDev has grown up a lot since last September. One rather surprising change, to me, is the splitting of the project into the "base" PyDev, still under the EPL (Eclipse Public License), and the separate, commercial PyDev Extensions. The Extensions may be reviewed for one month for free, but come with a highly annoying nagware dialog every half hour(!).

PyDev Extensions does some stuff that even the much pricier Komodo and Wing professional versions do not. In particular, PyDev provides in-editor warning and error markers for more than the by-now standard syntax errors. For instance, PyDev warns for unused imports, unused variables, and so forth, and will put up an error marker if you try to reference a variable that doesn't exist. As I said in my presentation, "it's like a PyChecker that doesn't suck."

This is still pretty new code, though, and there were some rough edges. In one file, I had an unused "from cStringIO import StringIO" import that PyDev didn't catch for some reason. The import checker also seemed to have issues with nested packages, and the "quick fix" feature doesn't work like you'd expect with previous Eclipse experience--clicking on the error marker in the margin does nothing. (If you remember the keyboard shortcut, though, that works fine.)

This seems symptomatic of PyDev in general -- flashes of brilliance, but sometimes problematic. As another example of this, sometimes code completion didn't work at all. PyDev consistently failed to complete WxPython code, for instance, perhaps because I installed the wx library after PyDev did its initial scan of site-packages.) But other times PyDev completed better than Komodo (and as well as Wing), e.g., PyDev could figure out the attributes of the object returned by file(), where Komodo could not.

It's also worth noting that PyDev's "go to definition" feature was unquestionably the best: if it can't figure out where your symbol comes from, it falls back to a textual analysis, which is better than nothing.

SPE

With PyDev switching to a commercial license for, apparently, the bulk of new development, SPE is probably the best remaining choice for a free IDE. (SPE is licensed under the GPL; the author, Stani, does appreciate donations if you find it useful.)

At this point, SPE is rather less mature than the other IDEs I reviewed. SPE ships with the Kiki regexp builder, the wxGlade gui builder, PyChecker, and the winpdb debugger, but of these only PyChecker is really "integrated" with the main SPE editor. Which is a good start, but I found PyChecker's suggestions less useful than PyDev's integrated "Code Analysis;" PyChecker was simply wrong too often.

Another shortcoming is the lack of the ability to create and save "projects" containing information such as the correct PYTHONPATH for a given codebase. Without this, SPE was unable to figure out most of my imports in one of the projects I on which I tested it. (Stani says this should be easy to add for a future release.)

SPE's code completion was weakest overall. It seems to be primarily vim-style completion: if a symbol is used elsewhere in the same file, e.g., "self.foo", it can guess "foo" as a completion after you type "self." There are a few cases where it is smarter than this, but not many -- primarily, it could complete top-level constructs imported from a module, but it was helpless once you started instantiating objects from that and asking about those.

It is worth noting that, especially as open-source projects go, SPE's documentation is pretty decent. The manual is fairly comprehensive and up-to-date, and includes some helpful tutorials. (Stani provides an ad-free PDF version of the manual as a thank-you for donations to the project.)

Komodo

Komodo has at least two large features that the other IDEs do not: support for multiple (dynamic) languages, and a real integrated gui builder, targetting Tk. (There is a dialog in the 3.5.2 build that warns you that code completion--"AutoComplete"--is only supported in Perl and Python right now, but ActiveState's CTO was in the audience and said he believes completion is supported in all 5 languages now. I'd doublecheck this if I had any others installed at the moment, but I don't.) I work in Python full time, but if you work in multiple languages Komodo is pretty much the only option if you want something more sophisticated than Emacs.

Komodo's is also the only Tkinter gui builder I'm aware of that's actually worth using. (Although if you're open to using wxWidgets instead of Tk, wxGlade is free and quite good.)

As far as I can tell, ActiveState has fixed the parsing and autocompletion problems I noticed in 3.1. Perhaps the only major remaining shortcoming is the lack of a real "go to definition." "Find symbol" is essentially text-based, and although the preview pane of where potential matches are found is cool, it's a lot less useful than having a real introspection-based approach. (It also feels kind of slow.)

On a lower-end machine, Komodo is as snappy as SPE and much more responsive than Wing or PyDev. (I was floored when I found out Komodo is built on... mozilla! Do a find for .js and .py in the lib directory. Crazy.) The difference probably won't be noticable on any machine built in the last couple years, but if you have a (really) low end machine, you may want to take this into consideration.

Wing

Wing still gets the little things right more often than the competition. One small example: if you have a multiline expression, pressing tab on the second line lines things up The Right Way. With the others, pressing tab... inserts a tab. (4 spaces, actually.) I suppose this could be called a matter of taste; mine is shaped by years of Emacs use. Which (I confirmed) the Wing developers share, so maybe that's why I tend to like their decisions in such matters.

Overall, Wing's completion is clearly best-of-breed right now. For instance, Komodo is unable to complete the tkinter widgets generated by its builder--try typing, say, "self._button_1."--but Wing is able to do so. There are other cases where Wing out-completes Komodo--another is the file() example mentioned with PyDev. Wing also completes locals for you, which is more useful than it sounds until you try it. (The one thing Komodo does better here is that besides giving you a list of symbols, Komodo's completion also visually indicates whether each symbol represents a function or a field.)

Wing's Emacs mode is by far the best. If you're curious, my litmus test for a good Emacs mode is, "Does it make you use a mouse where the keyboard should work perfectly well?" Wing does not; opening a file with C-x C-f brings up a minibuffer (tab-completed, of course) for file selection. No mouse needed! (My litmus test for a _really_ good Emacs mode is a kill ring; Wing doesn't do this yet.)

With version 2.1, Wing is introducing a vi mode, too, but I have no idea what the litmus test is for a good vi mode.

Conclusion

Komodo and Wing are polished, solid choices. Both have excellent debuggers and source-control integration. Both, frankly, will frustrate you less than PyDev or SPE at this time, if you spend a lot of time coding. Wing Professional is about 1/3 less expensive than Komodo. (I talked with several people who weren't aware that both ActiveState and WingWare offer painless, fully functional free demo licenses for one month. So if you're curious, it costs you nothing to try.)

PyDev + Extensions is in the same price range as the Komodo and WingWare personal editions. (All are around $30.) Komodo's personal edition is slightly less crippled than Wing's: Komodo's leaves out the gui builder and source control integration; Wing's also leaves out the Source Assistant panel (basically, the calltips functionality) and some debugger features.

If the PyDev can shake the bugs off it will become even more compelling. I suspect that PyDev's relative bugginess may be due in part to its lesser opportunity to "dogfood" -- the other IDEs are written in Python and their developers, I'm sure, primarily use their own product. PyDev is written in Java, so this isn't an option.

As noted previously, SPE is the only really free choice left. It's still rough around the edges in a lot of ways, perhaps most notably with the non-integrated debugger, but it's better than the other free options. (I covered more of those in the last review. It wasn't pretty.)

My own choice hasn't changed; after revisiting the latest versions of each product, I think Wing Professional still fits my particular needs best.

Feature matrix

PyDev SPE Komodo Wing IDE
Signals syntax errors Yes Yes Yes Yes
Keyboard Macros No No Yes Yes
Configurable Keybindings Yes Yes* Yes Yes**
*Through external configuration file
**Weak UI; expect to do a lot of manual browsing
Tab Guides No Yes Yes Yes
Smart Indent* Yes Yes Yes Yes
*Knows to de-indent a level after break/return/etc. statements
Code completion Decent* Poor* Good** Excellent
*See review text for discussion
**Also, indicates methods vs fields (but not properties)
Call tips During completion Yes Yes Yes*
*"Source Assistant" provides calltips and docstrings in a separate panel
"Go to definition" for python symbols Yes* Yes ("Browse source") No** Yes
*Supplemented with textual analysis; overall best
**"Find symbol" is basically a find-in-files text search
Templates Yes No Yes Yes
Source Control Integration Eclipse* No CVS/Perforce/SVN CVS/Perforce/SVN
*CVS is standard; plugin availability varies for others

Debugger

PyDev SPE Komodo Wing IDE
Conditional breakpoints Yes Yes+ Yes Yes
+SPE has no integrated debugger but includes winpdb
Debug-integrated Console Yes Yes+** Yes Yes
**"Special commands" make winpdb's console somewhat cumbersome
Debug external programs* Yes Yes+ Yes Yes
*E.g., a script processing a web server request

Miscellaneous

PyDev SPE Komodo Wing IDE
GUI Builder No Wx* Tk No
*the free WxGlade builder is distributed with SPE, but not integrated with it
Emulation Emacs (poor) None Emacs (poor) Emacs (good); VI (?)
Documentation Poor Decent Excellent Good
Approximate memory footprint 150MB 10MB 50MB 50MB
Unique features Code Analysis;
basic refactoring
UML diagrams;
PyChecker integration
Multilanguage; save macros;
regular expression builder
Scriptable with python

Monday, February 06, 2006

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL. It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience.

The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why:

  • Standards compliance
  • Completeness
  • Maintainability
  • Beauty

Standards compliance

SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL.

The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition from SQLObject, PyDO2, Django, or SQLAlchemy, and use it with any of the others.

A quote from the django site:

Our philosophy is that the model (in Python code) is the definitive place to store all model logic.

Yes, application logic belongs in application code. But the definitive source for the schema definition should be the database, unless you're using an object database like Durus or ZODB. Of course, the reason those (and OODBs in general) haven't taken off is that except for very simple applications, people want to access their data from more than just their Python code. So respect that. (Or be honest and require your OODB of choice.) Encourage standards use instead of a proprietary API. Let the database be the source of truth for the schema, and use standard tools to define it.

Completeness

Another strength of relational databases is the ability to define data integrity rules in a declarative fashion. It's well-understood by now that declarative code is much less error prone (and quicker to write) than the procedural equivalents. So if you're going to re-invent DDL, you need to support ON DELETE and ON UPDATE clauses. You need to support REFERENCES. You need to support CHECK constraints. You need to support defining triggers.

I don't know any ORM tool that allows doing all this from Python. So selling your tool as "you don't need to know DDL, just use our API" isn't doing users any favors.

It's okay to say, "here's our API; it lets you do a subset of what your database can handle so you can jump right in and get started without learning DDL." There's no harm in this. But when your tool is designed such that it doesn't expose the power of your database, but it doesn't really work with externally defined schemas either, that's a Bad Thing.

Maintainability

Even the best-designed schema will need to be updated eventually. If your project lives long enough, this is 100% certain. So if you're going to replace DDL, you'd better have an upgrade strategy.

But an automatic approach that only knows that you want to update from schema A to schema B is doomed to failure. [Update: I thought that sqlobject-admin took this approach, but Ian Bicking corrected me.] Say you add a column to a table: what should the value be for existing rows? Sometimes NULL. Sometimes some default value. Sometimes the value should be derived from data in other tables. There's no way to automate this for all situations, and usually sooner than later, you're back to DDL again.

Instead of spending effort on a fundamentally flawed approach, better to encourage standard best practices: the "right way" to maintain databases, that everyone who works on them enough settles on eventually, is DDL scripts, checked into source control. Old-fashioned, but if you stick to it, you'll never have a situation where you start an upgrade on your live server and run into a problem halfway through, because you've already run the exact scripts on your test server. A good ORM design accommodates this, rather than making it difficult.

Beauty

Okay, maybe DDL isn't the most beautiful creature ever birthed by a standards committee. But a lot of things are less beautiful, and those are what you get when you try to design DDL out.

  • Re-inventing the wheel is not beautiful. Like the django guys said (about templates), "don't invent a programming language." Right idea. Spend that energy doing something useful instead.
  • Violating DRY isn't beautiful. As decribed above, your users will need DDL at some point. When that happens, are you going to make their lives harder by forcing them to update their DDL-ish model in a separate .py file as well (with all the attendant possibilities for mistakes), or are you going to make them easier with an option to simply introspect the changes?

    (It's true that an ORM tool can't divine everything you want to say about your model on the Python side from the database. This is particularly true for SQLAlchemy, which lets you go beyond the simple "one table, one class" paradigm. But that's no reason to force the programmer to duplicate the parts that an ORM can and should introspect: column types, foreign keys, uniqueness and other constraints, etc.)

  • Treating the database like a slightly retarded object store is not beautiful. Even MySQL supports (simple) triggers and most constraint types these days. Allow users to realize the power afforded by a modern RDBMS. If your ORM encourages users to avoid features that, say, MySQL3 doesn't have, you're doing something wrong.

Conclusion

Avoid the temptation to re-invent the wheel. Respect your users, and allow them to continue to use industry-standard schema specification tools. Encourage using the strengths of modern relational databases instead of ignoring them. Don't require behavior that will cause your users pain down the road.

I mentioned 4 ORM tools near the beginning of this post. Here's how I see them with respect to this discussion:

  • PyDO2: The purest "let the database specify the schema" approach of the four. Supports runtime introspection. Does not generate DDL from its models; if you manually specify column types, it's probably because you only want a subset of the table's columns to show in your class.
  • SQLAlchemy: Allows generation of DDL from .py code but does not require it nor (in my reading) encourage this for production use. Robust runtime introspection.
  • SQLObject: supports runtime introspection (buggy when I used it, but may be better now). Python-based API does not support modern database features. (In a deleted comment here -- still viewable at this writing through Google Cache -- Ian Bicking says that SQLObject prefers what Martin Fowler calls an Application Database. Which as near as I can tell means that SQLObject is fine if you would be better off using an OODB; otherwise, it may be a poor choice. Perhaps the deletion indicates he's had second thoughts on this.)
  • Django: the most clearly problematic. No runtime introspection support; schema must be specified in its python-based API, which does not support modern database features. (Apparently their approach -- paraphrased -- is, "if sucky database XXY doesn't support a feature, we won't support it for anyone.") Django's ORM does have an external tool for generating .py models from an existing database, but once you start making changes, well, if you don't mind clearing data, just pipe the output of the appropriate django-admin.py sqlreset command into your database's command-line utility. Otherwise, you get to write an alter script, then manually sync up your .py model with the result.

[Dec 14 note for redditors: this post was written in Feb 2006. Some of the commentary here on specific tools is out of date.]

Thursday, January 12, 2006

PyDO2 slides

I presented at the utah python user group last night. I gave an introduction to the PyDO2 ORM tool as well as the python DB API and psycopg. Most readers are probably most interested in the pydo2 section. Here's the slides: http://utahpython.org/data/python-and-databases.pdf.

I briefly mentioned SqlAlchemy in the context of, "here's some stuff that SqlAlchemy does that pretty much nobody else can," but didn't have time to cover TWO ORM tools. It's worth having a look at though if you've reached the limits of what pydo2 et al can do.

Friday, January 06, 2006

how well do you know python, part 10

Take Alex Martelli's Number class from the augmented assignment example in What's New in Python 2.0:

class Number:
    def __init__(self, value):
        self.value = value
    def __iadd__(self, increment):
     return Number( self.value + increment)

>>> n = Number(5)
>>> n
<__main__.Number instance at 0x00356B70>

That's not very pretty. Let's add a __getattr__ method and leverage all those nice methods from the int or float or whatever it's initialized with:

class Number:
    def __init__(self, value):
        self.value = value
    def __iadd__(self, increment):
     return Number( self.value + increment)
    def __getattr__(self, attr):
        return getattr(self.value, attr)

>>> n = Number(5)
>>> n
5

Great, the __str__ method from our int is being used. Let's keep going with the example now:

>>> n += 3
>>> n.value
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: 'int' object has no attribute 'value'

What happened here?


Extra bonus unrelated mini-question to celebrate the 10th installment of HWDYKP:

Given the following code,

class Foo: pass
Bar = Foo

What do you predict is the result of the following expressions:

isinstance(Bar(), Foo)
isinstance(Foo(), Bar)

Monday, January 02, 2006

ORM design part 2

Glyph Lefkowitz cites me as an inspiration to write Why Axiom Doesn't Expose SQL. Alas, I disagree with most of what he says.

My post was about how if you're writing a tool that presupposes the use of a relational database, it's stupid to try to protect your users from having to know SQL. (This also means I think projects that bend over backwards to pretend ALTER TABLE is too hard are misguided, as well. But that's another subject.)

Glyph's first argument is that any form of SQL is an invitation to sql injection attacks. This particular form of scare mongering isn't appreciated. Come on: this is 2005. It's ridiculously easy to write injection-proof SQL, even by hand. Arguing that allowing SQL allows injection attacts is like arguing that coding in python allows "shutil.rmtree('/')": correct, but irrelevant.

Glyph further claims that "interfaces should be complete things," and that this justifies trying to obliterate any trace of SQL from your ORM. Because if you need SQL, your interface just wasn't complete enough.

The thing is, an ORM isn't just an interface: it's an extra layer of abstraction. Abstraction that obscures or prevents access to its internals smacks of the kind of we-know-better-than-you thinking that infects most statically typed language libraries. Don't do this.

Trying to design for "everything the user could possibly want, or at least everything we think he should want," while easier than providing for the unexpected, is almost always doomed to failure. Glyph gives the example of the Python os module -- why use C when os gives you everything you want? -- and yet, my current work project involves multiple C modules. If the Python designers had taken the attitude of "we'll give you all the everything we think you need, and if we leave out something you want, too bad," Python would have been a non-candidate for the job, and I'd be working somewhere else.

Fundamentally, tool designers more concerned about providing power and usability know that sometimes you need to go to a lower level of abstraction, and design for this up-front. Just like Python.

I do agree completely with Glyph on one point:

[M]ost ORMs are really, really awful. They are heavy on the "object" and not so much on the "database". There are a lot of features that SQL provides which they don't expose.

It sometimes seems like we're rapidly approaching the point where there will be as many half-assed Python ORM tools as there are half-assed Python web frameworks. Don't build another half-assed tool. Pick an existing project that sucks less than the others and improve it. One ORM that doesn't suck much (I mean that as a compliment) is PyDO2. Another is SQLAlchemy. I've been trying SQLAlchemy out during the last few days, and I really like the direction it's going in. (Don't be too intimidated by the introduction: it looks overcomplicated, but it's actually exceptionally well thought-out.)