Tuesday, February 28, 2006

PyCon Python IDE review

I presented an IDE review at PyCon last Friday. It was basically a re-review of what I thought were the 3 most promising IDEs from the Utah Python User Group IDE review, to which I added SPE, which was by far the most popular of the ones we left out that time. The versions reviewed are:

I'd intended to base my presentation around a comparison of writing a smallish program in each of the IDEs, but the more I tried to make this not suck, the more I realized it was a losing proposition. Instead, I decided to try to focus on the features in each that most set them apart from the others (both positive and negative); this seemed more likely be useful.

(I did a new feature matrix for this review, which is included after my comments. The slides I used are also up, at http://utahpython.org/jellis/pycon-ides.pdf, but aren't very useful absent video of the presentation itself. Hence this post.)

PyDev

PyDev has grown up a lot since last September. One rather surprising change, to me, is the splitting of the project into the "base" PyDev, still under the EPL (Eclipse Public License), and the separate, commercial PyDev Extensions. The Extensions may be reviewed for one month for free, but come with a highly annoying nagware dialog every half hour(!).

PyDev Extensions does some stuff that even the much pricier Komodo and Wing professional versions do not. In particular, PyDev provides in-editor warning and error markers for more than the by-now standard syntax errors. For instance, PyDev warns for unused imports, unused variables, and so forth, and will put up an error marker if you try to reference a variable that doesn't exist. As I said in my presentation, "it's like a PyChecker that doesn't suck."

This is still pretty new code, though, and there were some rough edges. In one file, I had an unused "from cStringIO import StringIO" import that PyDev didn't catch for some reason. The import checker also seemed to have issues with nested packages, and the "quick fix" feature doesn't work like you'd expect with previous Eclipse experience--clicking on the error marker in the margin does nothing. (If you remember the keyboard shortcut, though, that works fine.)

This seems symptomatic of PyDev in general -- flashes of brilliance, but sometimes problematic. As another example of this, sometimes code completion didn't work at all. PyDev consistently failed to complete WxPython code, for instance, perhaps because I installed the wx library after PyDev did its initial scan of site-packages.) But other times PyDev completed better than Komodo (and as well as Wing), e.g., PyDev could figure out the attributes of the object returned by file(), where Komodo could not.

It's also worth noting that PyDev's "go to definition" feature was unquestionably the best: if it can't figure out where your symbol comes from, it falls back to a textual analysis, which is better than nothing.

SPE

With PyDev switching to a commercial license for, apparently, the bulk of new development, SPE is probably the best remaining choice for a free IDE. (SPE is licensed under the GPL; the author, Stani, does appreciate donations if you find it useful.)

At this point, SPE is rather less mature than the other IDEs I reviewed. SPE ships with the Kiki regexp builder, the wxGlade gui builder, PyChecker, and the winpdb debugger, but of these only PyChecker is really "integrated" with the main SPE editor. Which is a good start, but I found PyChecker's suggestions less useful than PyDev's integrated "Code Analysis;" PyChecker was simply wrong too often.

Another shortcoming is the lack of the ability to create and save "projects" containing information such as the correct PYTHONPATH for a given codebase. Without this, SPE was unable to figure out most of my imports in one of the projects I on which I tested it. (Stani says this should be easy to add for a future release.)

SPE's code completion was weakest overall. It seems to be primarily vim-style completion: if a symbol is used elsewhere in the same file, e.g., "self.foo", it can guess "foo" as a completion after you type "self." There are a few cases where it is smarter than this, but not many -- primarily, it could complete top-level constructs imported from a module, but it was helpless once you started instantiating objects from that and asking about those.

It is worth noting that, especially as open-source projects go, SPE's documentation is pretty decent. The manual is fairly comprehensive and up-to-date, and includes some helpful tutorials. (Stani provides an ad-free PDF version of the manual as a thank-you for donations to the project.)

Komodo

Komodo has at least two large features that the other IDEs do not: support for multiple (dynamic) languages, and a real integrated gui builder, targetting Tk. (There is a dialog in the 3.5.2 build that warns you that code completion--"AutoComplete"--is only supported in Perl and Python right now, but ActiveState's CTO was in the audience and said he believes completion is supported in all 5 languages now. I'd doublecheck this if I had any others installed at the moment, but I don't.) I work in Python full time, but if you work in multiple languages Komodo is pretty much the only option if you want something more sophisticated than Emacs.

Komodo's is also the only Tkinter gui builder I'm aware of that's actually worth using. (Although if you're open to using wxWidgets instead of Tk, wxGlade is free and quite good.)

As far as I can tell, ActiveState has fixed the parsing and autocompletion problems I noticed in 3.1. Perhaps the only major remaining shortcoming is the lack of a real "go to definition." "Find symbol" is essentially text-based, and although the preview pane of where potential matches are found is cool, it's a lot less useful than having a real introspection-based approach. (It also feels kind of slow.)

On a lower-end machine, Komodo is as snappy as SPE and much more responsive than Wing or PyDev. (I was floored when I found out Komodo is built on... mozilla! Do a find for .js and .py in the lib directory. Crazy.) The difference probably won't be noticable on any machine built in the last couple years, but if you have a (really) low end machine, you may want to take this into consideration.

Wing

Wing still gets the little things right more often than the competition. One small example: if you have a multiline expression, pressing tab on the second line lines things up The Right Way. With the others, pressing tab... inserts a tab. (4 spaces, actually.) I suppose this could be called a matter of taste; mine is shaped by years of Emacs use. Which (I confirmed) the Wing developers share, so maybe that's why I tend to like their decisions in such matters.

Overall, Wing's completion is clearly best-of-breed right now. For instance, Komodo is unable to complete the tkinter widgets generated by its builder--try typing, say, "self._button_1."--but Wing is able to do so. There are other cases where Wing out-completes Komodo--another is the file() example mentioned with PyDev. Wing also completes locals for you, which is more useful than it sounds until you try it. (The one thing Komodo does better here is that besides giving you a list of symbols, Komodo's completion also visually indicates whether each symbol represents a function or a field.)

Wing's Emacs mode is by far the best. If you're curious, my litmus test for a good Emacs mode is, "Does it make you use a mouse where the keyboard should work perfectly well?" Wing does not; opening a file with C-x C-f brings up a minibuffer (tab-completed, of course) for file selection. No mouse needed! (My litmus test for a _really_ good Emacs mode is a kill ring; Wing doesn't do this yet.)

With version 2.1, Wing is introducing a vi mode, too, but I have no idea what the litmus test is for a good vi mode.

Conclusion

Komodo and Wing are polished, solid choices. Both have excellent debuggers and source-control integration. Both, frankly, will frustrate you less than PyDev or SPE at this time, if you spend a lot of time coding. Wing Professional is about 1/3 less expensive than Komodo. (I talked with several people who weren't aware that both ActiveState and WingWare offer painless, fully functional free demo licenses for one month. So if you're curious, it costs you nothing to try.)

PyDev + Extensions is in the same price range as the Komodo and WingWare personal editions. (All are around $30.) Komodo's personal edition is slightly less crippled than Wing's: Komodo's leaves out the gui builder and source control integration; Wing's also leaves out the Source Assistant panel (basically, the calltips functionality) and some debugger features.

If the PyDev can shake the bugs off it will become even more compelling. I suspect that PyDev's relative bugginess may be due in part to its lesser opportunity to "dogfood" -- the other IDEs are written in Python and their developers, I'm sure, primarily use their own product. PyDev is written in Java, so this isn't an option.

As noted previously, SPE is the only really free choice left. It's still rough around the edges in a lot of ways, perhaps most notably with the non-integrated debugger, but it's better than the other free options. (I covered more of those in the last review. It wasn't pretty.)

My own choice hasn't changed; after revisiting the latest versions of each product, I think Wing Professional still fits my particular needs best.

Feature matrix

PyDev SPE Komodo Wing IDE
Signals syntax errors Yes Yes Yes Yes
Keyboard Macros No No Yes Yes
Configurable Keybindings Yes Yes* Yes Yes**
*Through external configuration file
**Weak UI; expect to do a lot of manual browsing
Tab Guides No Yes Yes Yes
Smart Indent* Yes Yes Yes Yes
*Knows to de-indent a level after break/return/etc. statements
Code completion Decent* Poor* Good** Excellent
*See review text for discussion
**Also, indicates methods vs fields (but not properties)
Call tips During completion Yes Yes Yes*
*"Source Assistant" provides calltips and docstrings in a separate panel
"Go to definition" for python symbols Yes* Yes ("Browse source") No** Yes
*Supplemented with textual analysis; overall best
**"Find symbol" is basically a find-in-files text search
Templates Yes No Yes Yes
Source Control Integration Eclipse* No CVS/Perforce/SVN CVS/Perforce/SVN
*CVS is standard; plugin availability varies for others

Debugger

PyDev SPE Komodo Wing IDE
Conditional breakpoints Yes Yes+ Yes Yes
+SPE has no integrated debugger but includes winpdb
Debug-integrated Console Yes Yes+** Yes Yes
**"Special commands" make winpdb's console somewhat cumbersome
Debug external programs* Yes Yes+ Yes Yes
*E.g., a script processing a web server request

Miscellaneous

PyDev SPE Komodo Wing IDE
GUI Builder No Wx* Tk No
*the free WxGlade builder is distributed with SPE, but not integrated with it
Emulation Emacs (poor) None Emacs (poor) Emacs (good); VI (?)
Documentation Poor Decent Excellent Good
Approximate memory footprint 150MB 10MB 50MB 50MB
Unique features Code Analysis;
basic refactoring
UML diagrams;
PyChecker integration
Multilanguage; save macros;
regular expression builder
Scriptable with python

Monday, February 06, 2006

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL. It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience.

The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why:

  • Standards compliance
  • Completeness
  • Maintainability
  • Beauty

Standards compliance

SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL.

The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition from SQLObject, PyDO2, Django, or SQLAlchemy, and use it with any of the others.

A quote from the django site:

Our philosophy is that the model (in Python code) is the definitive place to store all model logic.

Yes, application logic belongs in application code. But the definitive source for the schema definition should be the database, unless you're using an object database like Durus or ZODB. Of course, the reason those (and OODBs in general) haven't taken off is that except for very simple applications, people want to access their data from more than just their Python code. So respect that. (Or be honest and require your OODB of choice.) Encourage standards use instead of a proprietary API. Let the database be the source of truth for the schema, and use standard tools to define it.

Completeness

Another strength of relational databases is the ability to define data integrity rules in a declarative fashion. It's well-understood by now that declarative code is much less error prone (and quicker to write) than the procedural equivalents. So if you're going to re-invent DDL, you need to support ON DELETE and ON UPDATE clauses. You need to support REFERENCES. You need to support CHECK constraints. You need to support defining triggers.

I don't know any ORM tool that allows doing all this from Python. So selling your tool as "you don't need to know DDL, just use our API" isn't doing users any favors.

It's okay to say, "here's our API; it lets you do a subset of what your database can handle so you can jump right in and get started without learning DDL." There's no harm in this. But when your tool is designed such that it doesn't expose the power of your database, but it doesn't really work with externally defined schemas either, that's a Bad Thing.

Maintainability

Even the best-designed schema will need to be updated eventually. If your project lives long enough, this is 100% certain. So if you're going to replace DDL, you'd better have an upgrade strategy.

But an automatic approach that only knows that you want to update from schema A to schema B is doomed to failure. [Update: I thought that sqlobject-admin took this approach, but Ian Bicking corrected me.] Say you add a column to a table: what should the value be for existing rows? Sometimes NULL. Sometimes some default value. Sometimes the value should be derived from data in other tables. There's no way to automate this for all situations, and usually sooner than later, you're back to DDL again.

Instead of spending effort on a fundamentally flawed approach, better to encourage standard best practices: the "right way" to maintain databases, that everyone who works on them enough settles on eventually, is DDL scripts, checked into source control. Old-fashioned, but if you stick to it, you'll never have a situation where you start an upgrade on your live server and run into a problem halfway through, because you've already run the exact scripts on your test server. A good ORM design accommodates this, rather than making it difficult.

Beauty

Okay, maybe DDL isn't the most beautiful creature ever birthed by a standards committee. But a lot of things are less beautiful, and those are what you get when you try to design DDL out.

  • Re-inventing the wheel is not beautiful. Like the django guys said (about templates), "don't invent a programming language." Right idea. Spend that energy doing something useful instead.
  • Violating DRY isn't beautiful. As decribed above, your users will need DDL at some point. When that happens, are you going to make their lives harder by forcing them to update their DDL-ish model in a separate .py file as well (with all the attendant possibilities for mistakes), or are you going to make them easier with an option to simply introspect the changes?

    (It's true that an ORM tool can't divine everything you want to say about your model on the Python side from the database. This is particularly true for SQLAlchemy, which lets you go beyond the simple "one table, one class" paradigm. But that's no reason to force the programmer to duplicate the parts that an ORM can and should introspect: column types, foreign keys, uniqueness and other constraints, etc.)

  • Treating the database like a slightly retarded object store is not beautiful. Even MySQL supports (simple) triggers and most constraint types these days. Allow users to realize the power afforded by a modern RDBMS. If your ORM encourages users to avoid features that, say, MySQL3 doesn't have, you're doing something wrong.

Conclusion

Avoid the temptation to re-invent the wheel. Respect your users, and allow them to continue to use industry-standard schema specification tools. Encourage using the strengths of modern relational databases instead of ignoring them. Don't require behavior that will cause your users pain down the road.

I mentioned 4 ORM tools near the beginning of this post. Here's how I see them with respect to this discussion:

  • PyDO2: The purest "let the database specify the schema" approach of the four. Supports runtime introspection. Does not generate DDL from its models; if you manually specify column types, it's probably because you only want a subset of the table's columns to show in your class.
  • SQLAlchemy: Allows generation of DDL from .py code but does not require it nor (in my reading) encourage this for production use. Robust runtime introspection.
  • SQLObject: supports runtime introspection (buggy when I used it, but may be better now). Python-based API does not support modern database features. (In a deleted comment here -- still viewable at this writing through Google Cache -- Ian Bicking says that SQLObject prefers what Martin Fowler calls an Application Database. Which as near as I can tell means that SQLObject is fine if you would be better off using an OODB; otherwise, it may be a poor choice. Perhaps the deletion indicates he's had second thoughts on this.)
  • Django: the most clearly problematic. No runtime introspection support; schema must be specified in its python-based API, which does not support modern database features. (Apparently their approach -- paraphrased -- is, "if sucky database XXY doesn't support a feature, we won't support it for anyone.") Django's ORM does have an external tool for generating .py models from an existing database, but once you start making changes, well, if you don't mind clearing data, just pipe the output of the appropriate django-admin.py sqlreset command into your database's command-line utility. Otherwise, you get to write an alter script, then manually sync up your .py model with the result.

[Dec 14 note for redditors: this post was written in Feb 2006. Some of the commentary here on specific tools is out of date.]