Wednesday, December 31, 2008

CouchDB: not drinking the kool-aid

This is my attempt to clear up some misconceptions about CouchDB and point out some technical details that a lot of people seem to have overlooked.  For the record, I like Damien Katz's blog, he seems like a great programmer, and Erlang looks cool.  Please don't hurt me.
First, and most important: CouchDB is not a distributed database.  BigTable is a distributed database.  Cassandra and dynomite are distributed databases.  (And open source, and based on a better design than BigTable.  More on this in another post.)  It's true that with CouchDB you can "shard" data out to different instances just like you can with MySQL or PostgreSQL.  That's not what people think when they see "distributed database." It's also true that CouchDB has good replication, but even multi-master replication isn't the same as a distributed database: you're still limited to the write throughput of the slowest machine.
Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project:
  • Writes are serialized.  Not serialized as in the isolation level, serialized as in there can only be one write active at a time.  Want to spread writes across multiple disks?  Sorry.
  • CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes.  Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less.
  • CouchDB is simple.  Gloriously simple.  Why is that a negative?  It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years.  The reason PostgreSQL et al have those features is because people want them.  And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing.  The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces.
  • A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce.  MapReduce is a great approach to trivially parallelizing certain classes of problem.  The problem is, it's tedious and error-prone to write raw MapReduce code.  This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively).  Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages.  It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.

Wednesday, December 24, 2008

RackLabs

Today marks one month that I've been working for Rackspace's RackLabs with the Mosso group in San Antonio, Texas.  (Anyone want to start a Python group?  The closest one is in Austin.)

It's kind of a gentle introduction to big company culture for me; at around 2,000 employees, Rackspace is easily ten times as large as any other company I've worked for, and 100 times as large as most.  Mosso is a lot smaller and RackLabs itself is smaller still, but I still had to go to five days (!) of corporate orientation.  Other than that, though, we're pretty much left alone by our corporate parent.

To start with, I'm working on Mosso's Cloud Files, which is basically an S3 competitor.  Cloud Files is similar to the work I did at Mozy, but there are a lot of technical differences.  Some are driven by Cloud Files being more of a general purpose storage engine than the one I wrote for Mozy; others stem from the Cloud Files authors being Twisted fans.

Strange coincidence: as with Mozy, I share an office here with a Debian developer, probably the only one in San Antonio.  My experience is that debian developers are pretty sharp guys, probably in no small part due to the rigorous screening process you have to go through.  They set a high bar.

Of course this continues to be my personal blog, and all opinions are mine alone.  RackLabs has its own blog for when they want to say something official.

Friday, December 19, 2008

Frustrated with git

I'm a little over a week into a git immersion program.  Let me just say that git's reputation of being a little arcane (okay, more than a little) and having a steep learning curve is 100% deserved.

One thing that would mitigate things is if git would give you feedback when you tell it to do nonsense.  But it doesn't.  Here's me trying to get machine B to always merge the debug branch from machine A when I pull:


232 git config branch.debug.remote origin
234 git config branch.master.remote origin
236 git config branch.master.remote origin/debug

All of these commands completed silently. None accomplished what I wanted. In the end I renamed master to old and debug to master to avoid having to fight it. Then I blew away my working copy and re-cloned because those config statements had created a new problem that I didn't know how to undo.

I'm sure the git virtuosos out there will know what was wrong.  That's not the point.  The point is that the tool gave me no feedback.  It was like git was telling me, "Figure it out yourself.  Or don't.  I don't care."  Which is par for the course with my git experience so far.

Thursday, December 18, 2008

FormAlchemy 1.1: admin app, composite key support

FormAlchemy 1.1 is out, so you no longer need to run trunk to get the admin app goodness -- now with i8n support. We also added support for all composite primary keys, and most composite foreign keys. (The distinction is, rendering an object depends on the PK, but loading relations depends on FKs.) Gael also added the fsblob extension, which allows storing blobs on the filesystem and the path in the database. (FormAlchemy can handle blob-in-the-db out of the box.)
(I previously blogged about basic FormAlchemy and the admin app, which are still good introductions.)
FormAlchemy has pretty good documentation. The most important page is form generation; instructions to configure the admin app are here.

Monday, November 03, 2008

An unusual approach to log parsing

I saw an interesting article about logging today on reddit, and it struck a nerve with me, specifically how most text logs are not designed for easy parsing.  (I don't agree with the second point, though -- sometimes logging is tracing, or perhaps more accurately, sometimes tracing is logging.)

We had a lot of log and trace data at Mozy, several GB per day.  A traditional parsing approach would have been tedious and prone to regressions when the messages generated by the server changed.  So Paul Cannon, a frustrated lisp programmer, designed a system where the log API looked something like this:

self.log('command_reply(%r, %r)' % (command, arg))

Then the log processor would define the vocabulary (command_reply, etc.) and, instead of parsing the log messages, eval them!   This is an approach that wouldn't have occurred to me, nor would I have thought of using partial function application to simplify passing state (from the log processor and/or previous log entries) to these functions.  (e.g., the entry for command_reply in the eval namespace might be 'command_reply': partial(self.command_reply, db_cursor, thread_id))

There are drawbacks to this approach; perhaps the largest is that this works best in homogeneous systems.  Python's repr function (invoked by the %r formatting character) is great at taking care of any quoting issues necessary when dealing with Python primitives, as well as custom objects with some help from the programmer.  But when we started having a C++ system also log messages to this system, it took them several tries to fix all the corner cases involved in generating messages that were valid python code.

On balance, I think this un-parsing approach was a huge win, and as the first application of "code is data" that made more than theoretical sense to me it was a real "eureka!" moment.

Friday, October 24, 2008

A small admin app for Pylons

I said that it would be possible to build a django-style admin interface for Pylons using FormAlchemy. (That is, generate a UI for basic CRUD operations for all your models, with no further configuration necessary.) I have a proof of concept in FA svn; it's missing some obvious features like internationalization so there is no official release yet. But the basics are there, so in the meantime, if you'd like to kick the tires, just install FA from svn and give it a try.

Here are some screenshots from a pylons app incorporating models from the FA test suite. (The admin controller is fully customizable using standard FA (and Pylons) techniques, but these are what you'd see out-of-the-box.)

Index: Order page: Creating a new Order: Deleting an Order: The User page: Editing a User instance: Documentation on using and customizing the pylons admin app is here.

Thursday, October 16, 2008

FormAlchemy 1.0

A little background: a few months ago, I went looking for a web framework that was good at automating CRUD (create/retrieve/update/delete) against an existing database schema. I tried django but its database introspection abilities are beyond feeble, and django-sqlalchemy was not mature enough. I tried dbmechanic but its dozen-plus dependencies, most of which were alpha-quality, gave me pause; so did its basic architecture on top of toscawidgets, which I think is The Wrong Way to build web apps. (I understand that the former problem has since been reduced; the latter has not.)

So, I went back to option #3, FormAlchemy. I knew SQLAlchemy could reflect very hairy schemas indeed, and what it could not reflect, it could certainly represent with a little manual help. And FormAlchemy was a decent start to automating CRUD with SA models. I added the ability to represent relations, automatic syncing of form input back to SA objects, Grid support, and a test suite. Then Gael came along and added internationalization, support for even more SA features, and Sphinx docs. Along the way we've killed enough bugs and added enough test cases (yes, the two are related) that we think we have a pretty solid release. Especially since I just released 1.0.1 fixing the most obvious problems. :)

I think all three FA committers use it mostly with Pylons; that said, FormAlchemy has no dependencies besides SQLAlchemy itself. You could easily use it with werkzeug or web.py or whatever.

Here, finally, is a quick FormAlchemy tutorial:

To get started, you only need to know about two classes, FieldSet and Grid, and a handful of methods:

  • render: returns a string containing the html
  • validate: true if the form passes its validations; otherwise, false
  • sync: syncs the model instance that was bound to the input data

This introduction illustrates these three methods. For full details on customizing FieldSet behavior, see the documentation.

We'll start with two simple SQLAlchemy models with a one-to-many relationship (each User can have many Orders), and fetch an Order object to edit:

 from formalchemy.tests import Session, User, Order
session = Session()
order1 = session.query(Order).first()

Now, let's render a form to edit the order we've loaded.

 from formalchemy import FieldSet, Grid
fs = FieldSet(order1)
print fs.render()

This results in the following form elements:

Note how the options for the User input were automatically loaded from the database. str() is used on the User objects to get the option descriptions.

To edit a new object, bind your FieldSet to the class rather than a specific instance:

  fs = FieldSet(Order)

To edit multiple objects, bind them to a Grid instead:

 orders = session.query(Order).all()
g = Grid(Order, orders)
print g.render()

Which results in:

Saving changes is similarly easy. (Here we're using Pylons-style request.params(); adjust for your framework of choice as necessary):

 fs = FieldSet(order1, request.params())
if fs.validate():
    fs.sync()
    session.commit()

Grid works the same way. More details in the documentation; start with Form generation.

To give FormAlchemy a try, just easy_install it. If you have any questions, Alex and I are often in both #sqlalchemy and #pylons on freenode. And of course there's always the mailing list.

Thursday, September 25, 2008

Available

Feature50 is winding down now that CEO Ben Galbraith has accepted a job offer elsewhere.  So, I'm interested in exploring my options, specifically, opportunities to build out the technology for a start-up working in concert with a strong business CEO. I've done this twice now.

Technical ability

I am a senior developer specializing in back-end technologies.  At Mozy, where I was employee #2, I wrote a distributed file repository that stores petabytes of data, an amount comparable to Amazon's S3.  I have 8 years of experience with PostgreSQL. I know how to design for scale, and how to find and remove bottlenecks.  I am not afraid of diving into a new code base; I took over as maintainer of the Spyce web framework and the FormAlchemy toolkit, and I have contributed features or patches to SQLAlchemy, Pylons, and Jython, among others.  

Soft skills

I enjoy building and working with a team.  At Feature50 I am responsible for technical interviews, and personally recruited five of our first eight developers.  At Mozy, I recruited three of the first five.  I designed a customized version of Review Board -- a code review tool -- for Feature50 and MediaBank, and contributed several patches back to the project.  I am active in the Python community and spoke at the last three PyCon conferences.  I have spoken at OSCON and I am speaking at PostgreSQL Conference West in October.

The bottom line

I'm looking to work on a challenging project -- that is, not Yet Another CRUD App -- with a small team. I am currently based in Utah; I am willing to work remotely or relocate.  Contact me at jonathan at utahpython dot org.

Monday, September 01, 2008

Blog Day recommendations

As with many things in the blogging echo chamber, blog day takes itself a little too seriously. But it's impossible not to love an excuse to talk about some of my favorite blogs that don't seem to have as much exposure as I think they deserve:

  1. Theo Schlossnagle, CEO of OmniTI, a scalability and performance consulting company. His best posts deal with scalability at the ops level. His book is good, too.
  2. Greg Linden, ex-Amazon engineer, ex-Findory founder, current MS Live Labs employee. He likes to post analyses of interesting CS talks and papers, particularly in the area of collective intelligence. Greg stays very on-topic so the most recent posts are about as representative as any.
  3. Chris Siebenmann writes about life as a professional sysadmin. He also sometimes blogs about python.
  4. Josh Berkus, PostgreSQL core team member, mostly blogs about current events in the database world, but every once in a while he writes a must-read post about database design. Google thinks that "Rules for Database Contracting" is his most popular post, and that's a good pick too.
  5. A non-technical pick: Eric Burns is the Gene Siskel of web comic critique. Except he's not dead.

Friday, August 29, 2008

App Engine conclusions

Having been eyeball deep in App Engine for a while, evaluating it for a project at work and putting together a presentation for the utah open source conference, I've reluctantly concluded that I don't like it.  I want to like it, since it's a great poster child for Python.  And there are some bright spots, like the dirt-simple integration with google accounts.  But it's so very very primitive in so many ways.  Not just the missing features, or the "you can use any web framework you like, as long as it's django" attitude, but primarily a lot of the existing API is just so very primitive.  

The DataStore in particular feels like a giant step backwards from using a traditional database with a sophisticated ORM.  Sure, it can scale if you use it right, but do you really know what that entails?

Take the example of simple counting of objects.  There's a count() method, but in practice, it's so slow you can't use it.  Denormalize with a .count property?  Yeah, that doesn't scale either: what you really need is a separate, sharded Counter class.  And yes, sharding is very, very manual.  (See slides 18-23 in the link there, and the associated video starting about 19:00.)  

You can't perform joins in GQL.  Or subselects.  Or call functions, aggregate or otherwise.  EVERYthing you are interested needs to be pre-computed.  (Or computed by hand client-side, which is so slow it's barely an option at all.)  I can extrapolate from this to my experience in production schemas and it's not pretty.

Of course, you also lose any ability to write declarative, set-based code, which is demonstrably less error-prone than the imperative alternative.  Take a simple example from my demo app.  Marking a group of todo items finished is four statements:

items = TodoItem.get_by_id(
  [int(id) for id in request.POST.getlist('item_id')])
for item in items:
  item.finished = datetime.now()
  item.put()

Compare this with SQL:

cursor.execute("update todo_items set finished = CURRENT_TIMESTAMP where id in %s",
             ([int(id) for id in request.POST.getlist('item_id')]))
Scalability is great but taking a big hit to back-end productivity is too high a price for all but a few applications.  GAE is still young, so maybe Google will improve things, but their attitude so far seems to be "we know how to scale so shut up and do it the hard way."  I hope I am wrong.

App Engine slides, code

My App Engine 101 slides and code are up now.

Bad news: my macbook pro did not work with the projector, period.

Good news: I have seen it do this before (in a room with several mac experts -- it was not user error) and brought a backup laptop.

Bad news: I forgot to include the django beta1 framework in my code upload, so I told people to just download it. But beta2 was out, and didn't work with the version of App Engine Helper I had. (It looks like r58 fixes this.) Manual poking about the django download site ensued until I got a new zip uploaded.

Good news: the conference organizers liked it anyway and asked me to present a second time later in the day. Everything just worked the second time around.

Monday, August 25, 2008

Google App Engine at the Utah Open Source Conference

App Engine is probably the biggest thing to happen to Python this year, so of course I volunteered to give a presentation on it at at the Utah Open Source Conference. (I'm scheduled for Friday, Aug 29, at 10:00 AM.) Last year's conference was a big success, so I'm looking forward to an even better experience this year.

Here's the abstract I submitted, before they blew away my paragraph breaks:

Google launched the App Engine service earlier this year to immense interest from the web development community. App Engine allows running applications on Google infrastructure, including BigTable, Google's non-relational, massively scalable database.

App Engine is appealing both at the low end, where small shops don't want to have to deal with hardware procurement and systems administration, and at the high end, where the kind of "instant scaling" App Engine promises to deal with bursty traffic is the holy grail of infrastructure planning. This tutorial will cover the basics of App Engine development, including development and deployment of a simple application.

Please sign up for an App Engine account and download the SDK ahead of time so we can jump right in to the code. Basic Python knowledge will be assumed.

After I submitted the proposal, I found out that all presentations are going to be 60 minutes long. That is not much time if we're going to do hands-on work, but you retain so much more by doing than you do merely from watching that I don't consider it optional. So seriously, come with the SDK installed. Those who do not, can look over the shoulders of those who do.

If you don't know Python and you're a last minute kind of person, you might want to attend Matt Harrison's talk the day before, 90% of the Python you need to know. Matt has presented several times at the Utah Python User Group as well as PyCon.

Bonus tip: if you can't make it to the UTOSC, the two best talks on App Engine are Rapid Development with Python, Django, and Google App Engine and Building Scalable Web Applications with Google App Engine. My presentation will cover similar material to the first of these.

Friday, August 15, 2008

A reminder

Now that I've been doing Python full time again for a while it's easy to forget how magical it can be.

Last night I got an IM from a friend of a friend asking for (a) a recommendation for a Python book and (b) advice on writing a screen scraper. I pointed him to Dive Into Python and BeautifulSoup. Just now he IMed me again, "Hey, thanks for the tip. I ended up writing a screen scraper that I hadn't completed in 2 days in Groovy in about 20 minutes last night in Python with BeautifulSoup. So thanks, you got another python convert."

I love my job.

Tuesday, July 22, 2008

SQLAlchemy-Migrate for dummies

I'm gave sqlalchemy-migrate a try today. I like it, and I'm going to keep using it. The one downside is that it's a bit hard to find "the least you need to know" in the documentation, especially if you lean old-school like me and prefer to write your upgrade scripts in raw sql. So here's my stab at it.

Create a "repository" for upgrade scripts:

migrate create path/to/upgradescripts "comment"

Create your manage script. If you have development/production dbs with different connection urls, create two scripts with the same repository but different urls:

migrate manage dbmanage.py --repository=path/to/upgradescripts --url=db-connection-url

For each database, create the Migrate metadata (a migrate_version table):

./dbmanage.py version_control

Create an upgrade script. This will create a script [next version number]-[database type]-upgrade.sql in the "versions" subdirectory of your "repository." That's all, so you could certainly do this by hand if you prefer, but letting the script do it is less error-prone:

./dbmanage.py script_sql sqlite

Edit the script.

For each database, apply the upgrade:

./dbmanage.py upgrade

Repeat the script/upgrade process as needed. That's it! Everything else is optional!

(What this gives you is a process where all your developers can have their own local database for development, and all they have to do is "svn up; ./dbmanage.py upgrade" without having to worry about which upgrade scripts have been applied or not.)

Sunday, July 13, 2008

How to tell when you're successful

You're successful when someone tries to get a cheap clone of your site done on a cheap-labor code monkey site.

I'm flattered, I think.  (Although I'd be more flattered if it were a good code monkey site.)

Wednesday, June 25, 2008

Brief review of the Matias Half Keyboard et al

I ended up buying four pieces of equipment to help deal with being temporarily one-handed: the Matias half keyboard, the X-keys foot pedal (cheaper than the Kinesis pedals, which got lukewarm reviews on Amazon), the Keyspan PR-US2 Presentation Remote, and the Pacific Outdoors 17-LC100 Folding Recliner.

The good: I'm very pleased with the recliner and modestly happy with the remote.  I got the recliner to take naps in; the brace on my arm didn't really accomodate lying down.  This $80 recliner compares well with zero gravity recliners costing over 10x as much.  (I've used two of the expensive variety; a BackSaver and one whose brand I don't recall.)  The only downside is you can either sit up, or recline fully; there is supposedly a way to adjust the recline angle, but it doesn't really work.  Expensive zero gravity recliners can all reliably lock at any angle you like.

The remote mostly worked as a mouse substitute that I could use with my immobilized right hand, reducing the need to slow down my left hand even more by switching from keyboard to mouse and back.  Unfortunately, the mouse control pad is not nearly as good as one of the IBM "pointing sticks;" it appears to have four control points, like an old Nintendo D-pad, which gives only 8 possible directions to move in.  This and a poorly quantized pressure sensitivity sometimes made things frustrating.  If I were to do this again I would try a handheld trackball instead, even though I could not find any wireless models.

The bad: the half keyboard did not help programming speed with one hand, and the foot pedal didn't improve things.  I've returned both.

The half keyboard gives you the left hand side of the keyboard, which toggles to the right side when the space bar is held down.  So "a" becomes ";", "f" becomes "j", snd so on.  For alphabetical keys, I found that it was true that I did not have to re-learn to touch type; I did not have to look at the keyboard, although I did have to pause and think, "does this one require the space toggle or not."  I got up to about 20 wpm before giving up, compared to 25 with one hand on a full keyboard.  I think I could have easily doubled that to 40+ wpm with enough practice to eliminate that pause and recognize "runs" of letters that can be typed without releasing the space, like "you," without thinking.  But that kind of investment wasn't worth it because of a serious flaw.

The half keyboard is really more like a "1/4 keyboard."  It only gives you the alphabetical keys and a couple punctuation marks.  No number keys with their !@#$ counterparts.  No F keys.  No arrow keys.  On a mac, you can have cmd or control but not both.

To allow these keys to be typed, there is a "numeric toggle" key that switches to keypad mode, and two other modes that you access by hitting "shift shift" and "shift shift shift."  Almost any line of code you might want to type is going to run into this.  Typing [0] for instance is shift shift s numerictoggle b numerictoggle shift shift a.  Even the symbol-averse Java will need parentheses for method calls, and yes, parens require mode switching too.  (As do braces.  Shudder!)

So I lost in the non-alphabetical and modifier access much more than I could see myself gaining on the pure alphabetical side.  

Finally, the modifier keys were on the right hand side of the keyboard where they very difficult to combine with shift.  I tried to ameliorate the modifier key problems with the X-Keys pedal, mapping the pedals to cmd/ctrl/option, but that didn't really work either.  (The included ikeys software wouldn't work at all.  At least ControllerMate worked in non-X applications, but since Wing is the only IDE that does locals completion well, using a non-X IDE temporarily was a non-starter.  Locals completion is nice with two hands, but absolutely essential with one.)  Note that this is more of an OS X issue than a problem with these pedals; apparently mapping pedals (x-keys or kinesis) to modifier keys works fine on windows.

So, the half-keyboard is not useful for programmers.  If it (a) were wireless and (b) had a non-skid backing -- it slid all over the place because the back side was just smooth plastic -- I could see it being useful for heavy smartphone users.  But it fails there too.  Good luck with this one, Matias.

Postscript: I considered trying the Frogpad as well as the half keyboard, but with users reporting that they got "up to 20 wpm after 2 weeks," it didn't sound worth the trouble.  So if I ever had to spend another three weeks one handed I am not sure what is left to try.  Probably I would try to use ControllerMate (os x) or xmodmap (linux) to make make a "half keyboard" in software that didn't suck so much, as suggested by one of the commenters in my first post.

Saturday, May 31, 2008

One-handed typing?

I separated my right shoulder so that arm is going to be out of commission for a while.  (I am right-handed.)  I'm managing about 25 wpm with one hand, or about 1/4 my normal speed.  This is frustrating.  The Handkey Twiddler has been out of production for a while.  The BAT is not OS X compatible. Anyone tried the Half Qwerty keyboard?  Are there other good options for under, say, $300?  (I found several very niche products for significantly more.)

I do plan to try voice recognition for email and IM but I can't see that working very well for code.

Monday, May 19, 2008

Jython Notes

I've been getting back into the Jython codebase this last week. The last time I submitted a Jython patch was in the beginning of 2004, so it's been a while. Things have changed... Jython is finally requiring Java 5 for the next release, which means the usual improvements, but especially good use of annotations. Here's some notes from my puttering around (mostly dragging Jython's set module up to compatibility with CPython 2.5's):
  • Expect Eclipse to be slightly confused. (Lots of "errors.") This is normal. Use ant to build.
  • ant regrtest is handy. run it before you start making changes so you know what's already broken in trunk. (At least between releases, jython does not appear to be religious about "no tests shall fail." But as a new developer you should make "no additional tests should fail" your motto.)
  • Subjective impression: Jython re performace is a bit slow. Jython uses its own re implementation predating the Java regular expressions in jdk 1.4. But, the JRuby guys reported that the jdk implementation doesn't perform very well, so Jython hasn't been in a hurry to switch. The JRuby solution was to port the oniguruma re engine from C to Java. But, Ruby's strings are byte-based and mutable where Jython's are not, so using the JRuby engine isn't just a matter of dropping it in. Also, these string differences may be a source of the poor performance the ruby people saw, so independant testing is in order here.
  • All of the Derived classes (PySetDerived, PyLongDerived, etc.) just exist to let python code subclass builtin types. Those derived classes are generated by a .py script in src/templates
  • If you add a Java class that needs to be exposed to python using the @Expose annotations, you need to add the class name to CoreExposed.includes, or Jython will default to picking attributes via reflection and it usually guesses wrong.
  • Given a PyObject, you can (usually) easily instantiate another PyObject of the same class with pyobject.getType().__call__(). The only times this won't work is when your type's __new__ does something tricky, like how PyFrozenSet or PyTuple return a singleton for an empty frozenset or tuple.
Thanks to all the people in #jython who helped me out, especially Philip Jenvey!

Friday, May 16, 2008

Quick tip for debugging with Jython

Currently, Jython ships with the pdb debugger module from Python 2.3. Unfortunately the 2.3 pdb is primitive even by command-line debugger standards. (For instance, if the program you are debugging throws an exception, it will take pdb down with it. Seriously. Did anyone actually use this thing?)

Fortunately all you have to do to get a much better experience is grab pdb.py, bdb.py, and cmd.py (for good measure) from a 2.5 CPython installation and run against that instead.

I've only tested this with Jython trunk but I think it should Just Work with the 2.2 release, too.

Friday, May 09, 2008

IDE update

Last night the Utah Python User Group held an editor/IDE smackdown. I'm not going to write an exhaustive summary, but here are some highlights:
  • ViM's OmniComplete is actually pretty decent. Calltip support in the GUI is also good. (GUI? ViM? Yeah, weird.)
  • Emacs completion, from Rope, is also good. Emacs's refusal to make any concession to GUIs though keeps things clunky. Not that it isn't great that Everything Works over plain ssh; that's fine, but going through classic Emacs buffers for docstrings or completion means everything takes more keystrokes than it should while being less useful than having that information Always On.
  • Rope also gives Emacs refactoring support that works surprisingly well.
  • PyDev still sees a big win from the Eclipse platform. Specifically, even though Subclipse and Subversive are a bit weak compared to the gold standard (that would be TortoiseSVN), they are much better than what you get with Komodo or Wing. Now that I am on OS X (no Tortoise) this is a bigger issue for me than it used to be.
  • PyDev Extensions has refactoring support now, too.
  • Komodo has limited support for completion inside django templates. Which is impressive, since the commands allowed in django templates aren't really Python, which is to say that you can't just use the same completion support that you use for normal Python code.
  • Mako template support with completion, anyone?
  • The latest versions of Komodo and Wing both integrate unittest support. Wing also supports doctest out of the box. Meaning, you click a button, your tests run, you get a pretty summary with click-to-go-to-the-source-of-the-error support. This might get me to finally upgrade to Wing 3. It's not that "python test.py" is so hard, so much as I do it so often that even a little more convenience adds up.
I was surprised how well ViM and Emacs do with Python now. ViM's modern inline interface for code completion and Emacs's refactoring support are particularly nice. The IDEs still win on the I part (Integration), in particular debugging and (for Eclipse at least) svn support.

Update: Ryan McGuire blogged about his Emacs presentation in more detail.

Update 2: John Anderson blogged about setting up ViM

Friday, April 11, 2008

How to piss off your customers in two easy steps

  1. Don't communicate with them
  2. Treat them like they owe you something

Google is off to a good (bad?) start with both of these in its management of the App Engine release.

Of the 120+ issues logged by beta testers, a few have been closed as wontfix or duplicate; most have no response at all from the App Engine team. I can't think of any other company that I've filed an issue with that took that long to get back to me. The good ones get back within hours.

The one exception I have seen is for the urllib issue, where gu...@python.org, presumably Guido, wrote

Providing a urllib replacement implemented on top of urlfetch shouldn't be particularly hard. If someone is willing to produce one, I'd be happy to review it and, if it passes muster, try to get it added.

Paraphrased: "maybe if you do our work for us we'll consider it."

WTF!

This isn't OSS, where "if you want something, do it yourself" is at least a semi-valid response. App Engine developers are all currently beta testing a product that Google hopes to eventually charge for. We're doing google a favor. (Context: the replacement Guido wants is a piece of code that will only ever be useful on app engine, and is something Google should have done in the first place instead of making urlfetch a public API. This is not code with a use case outside of App Engine.)

Maybe I'm over-sensitive, but this really rubs me the wrong way.

I hope Google can (a) put enough engineers on this that they can actually respond to issues, and maybe start closing some, and (b) remember that when you're selling a product, "why don't you fix it if it bothers you" is a poor response.

Thursday, April 10, 2008

The business case for Google App Engine

App Engine sure has caused a stir. Some of the competition is already scared, with reason.

But who is App Engine's real competition?

In a lot of ways, App Engine is in a class by itself. It competes on the high end with Amazon Web Services. But it also competes on the low end with every shared host out there. And thanks to the integration of Google authentication and the application directory you could also make a case that in an orthogonal way it competes with Facebook's application API.

At the low end, App Engine is a big deal for Python developers and anyone else who is allergic to PHP. Historically, you've really had to look hard for low end hosting that offered anything else. And as everyone who has given products away to colleges knows, Free is a fantastic hook to get developers to try out your platform. Once it's open for all, App Engine is going to become the preferred option for developers with the itch to write a toy or proof of concept and show it off to the world.

Less obviously (to developers, anyway), App Engine also a big deal for businesses that aren't quite big enough to hire a sysadmin, or who are big enough but still prefer not to deal with that complexity. (You thought hiring skilled developers is hard? If anything, hiring skilled sysadmins is harder.)

I suspect there are a substantial number of companies in the uncomfortable situation of really needing more performance than shared hosting offers, but not wanting the complexity of taking the next step, to dedicated servers with dedicated sysadmins.

Of course, given App Engine's constraints, porting such applications to it is only going to be an option in a few cases. The question is, are managers of new projects farsighted enough to see this problem coming and realize that app engine insures against it?

At the high end, AWS is the only real competition to App Engine, but as most observers have pointed out, they are different beasts. AWS offers far more flexibility, at the cost of far more hours from your ops department. (Although App Engine's datastore is a lot more sophisticated than the AWS SimpleDb, so the capabilities of AWS aren't a strict superset of App Engine's.) Contrary to the Joyent assertion linked earlier, it isn't necessarily stupid to trade flexibility for convenience. App Engine just works to an unprecedented degree in the field of high-end scalability.

As with anything this disruptive, there's been a certain amount of hysteria. Even people who should know better have repeated the idea that "nobody will want to acquire a product built on App Engine because you're locked in." This is stupid. Depending on a proprietary platform hasn't stopped products built on Oracle from being acquired, or products using AWS, or even products built on a proprietary UNIX. (Yes, those still exist.) Nobody will care if you build on App Engine, except maybe Microsoft and Yahoo. And even they can be pragmatic; Hotmail ran on BSD when Microsoft acquired them.

Lock-in is a real issue, but not because App Engine will keep you from being acquired, and not because Google will screw you once they have you in their clutches -- that would scare off new customers and thus be bad business. Lock-in is an issue because evolving requirements might make App Engine's confines less of a good fit than it started out. If you have to start adding servers at AWS or RackSpace to handle things you can't within App Engine, App Engine loses most of its value.

Wednesday, April 09, 2008

Language popularity, App Engine - style

Just for fun, here's the number of stars (interested people) for the different language-support feature requests for Google App Engine:

  • Perl: 85
  • Java: 69
  • Ruby: 67
  • PHP: 23
  • C#: 11
  • jvm, not just java: 7
  • Common Lisp: 5

Update: Perl is stuffing the ballot box :)

Google App Engine: Return of the Unofficial Python Job Board Feed

Over three years ago (!), I wrote a screen scraper to turn the Python Job Board into an RSS feed. It didn't make it across one of several server moves since then, but now I've ported it to Google's App Engine: the new unofficial python job board feed.
I'll be making a separate post on the Google App Engine business model and when it makes sense to consider the App Engine for a product. Here I'm going to talk about my technical impressions.
First, here's the source. Nothing fancy. The only App Engine-specific API used is urlfetch.
Unfortunately, even something this simple bumps up against some pretty rough edges in App Engine. It's going to be a while before this is ready for production use.
The big one is scheduled background tasks. (If you think this is important, star the issue rather than posting a "me too" comment.) Related is a task queue that would allow those scheduled tasks to easily be split into bite-size pieces, which is important for Google to allow scheduled tasks (a) without worrying about runaway processes while (b) still accomplishing an arbitrary amount of work.
If there were a scheduled task api, my feed generator could poll the python jobs site hourly or so, and store the results in the Datastore, instead of having a 1:1 ratio of feed requests to remote fetches.
While you can certainly create a cron job to fetch a certain url of your app periodically, and have that url run your "scheduled task," things get tricky quickly if your task needs to perform more work than it can accomplish in the small per-page time allocation it gets. Fortunately, I expect a scheduled task api from App Engine sooner rather than later -- Google wants to be your one stop shop, and for a large set of applications (every web app I have ever seen has had some scheduled task component) to have to rely on an external server to ping the app with this sort of workaround defeats that purpose completely.
Another rough edge is in the fetch api. Backend fetches like mine need a callback api so that a slow remote server doesn't cause the fetch to fail forever from being auto-cancelled prematurely. Of course, this won't be useful until scheduled tasks are available. I'm thinking ahead. :)
Finally, be aware that fatal errors are not logged by default. If you want to log fatal errors, you need to do it yourself. the main() function is a good place for this if you are rolling your own simple script like I am here.







Sunday, April 06, 2008

My half-baked thoughts on Python web frameworks

I have been lucky to be able to fill our recent open positions with people who know Python as well as Java so now we are up to half the (6 person) company in that category and preferring Python, and 2 of the others have played with Python and liked it at least well enough to not object. So the boss has conceded that it makes sense to go the Python route for our next project.

We're going to be doing a web, "next gen" version of our existing client-server project, which is mostly simple CRUD but does have 1000+ tables in its current incarnation. So we really need something that can autogenerate 90+% of the CRUD or we will go insane.

The trouble is, I still don't really like any of the Python web options 100%. (I like the web options in other languages less, but I'm a perfectionist.)

Django is well documented, its admin app is something everyone else envies, and newforms looks decent, but the ORM blows and I'm not fond of the template engine either. (Pre-emptive pedantry: yes, I know I can "import sqlalchemy." Please stop saying that like it means something; I'm not interested in defining models twice -- once for real work with SA, and once for interop with the rest of django.) Apparently django-sqlalchemy got far enough in PyCon sprints that it's kinda usable so working on that would be an option. Of course even then there is no guarantee the django core would accept it into mainline, and maintaining it as a "vendor branch" would proably suck. If django used a dscm like Mercurial I might be willing to do that, but svn is just too painful so that is a real risk.

I don't see a way to generate a page containing just a CRUD interface for table X with the django admin app. The admin app really is a monolithic application, not something you can easily re-use pieces of.

Regexps suck for url mapping.

Pylons is not well documented and after keeping an eye on this for something like 18 months I don't think this is a problem that will be solved, for whatever reasons. On the other hand, SA + mako is a very sane default, and both of those are well documented so it's really only core Pylons that suffers from doc crapitude, and core Pylons is fairly small. IRC responsiveness mitigates this further.

Pylons still doesn't have a good CRUD (or even high-level manual form generation) solution, which has bugged me for even longer than the docs. I can't fathom how people can tolerate writing this kind of boilerplate in 2008. Formalchemy gets about 30% of the way there. DBMechanic requires TG2 atm, although apparently hacking it to run on Pylons may not be too much effort; I would guess around 20% of the effort to get the django-sa project really usable.

TG2 is of course very bleeding edge and although I like genshi's syntax in theory, in practice XML templates irritate the hell out of me. (Very verbose, xinclude sucks compared to "inheritance," and incorporating rich dynamic content -- i.e., user-generated, like forum posts, that needs to include html tags -- is a PITA. Not to mention that having to write "a > b" when you mean "a > b" bugs me all out of proportion to the actual inconvenience it inflicts on me.) Still, better than the django templates.

I'm skeptical that TG2 is a big enough value add to want to add it (in its unfinished state) as a dependency vs rolling our own on Pylons. But DBMechanic does look like it could be exactly what I want in a CRUD generator.

web.py seems like more of a tech demo than a real product. I don't see any signs of a CRUD or form generator. reddit, probably the largest web.py site at least in terms of page views, moved to Pylons.

Zope 3 is alone in being really production ready without running from svn. Grok does do a good job of smashing zcml and z3c.form looks okay but lives up to the Zope reputation of complexity. (Field managers, widget managers -- are these the same things? -- widget modes, ...) AFAIK relational dbs are still second-class citizens in zope, and with all due respect to zodb it is no postgresql. OTOH there is z3c.sqlalchemy which gives me hope. Finally: you have to manually restart zope (per the Grok tutorial) after changing your .py files? Seriously?

Bottom line, Zope might actually be a decent option if we had a Zope expert on staff but we do not and I am not willing to tackle the learning curve alone.

Nevow: form handling is in flux. The new hotness is "pollenation forms," but that is svn-only and the api "will probably change."

Zope and Nevow both have their own xml-based templates predecessing but similar to genshi. Something like Nevow's Stan is obviously useful for programmatic template generation but it's not yet clear if that's going to be something we need. Probably only if we have to write our own form generator. If so, I suspect ripping a standalone Stan out of Nevow would be straightforward.

(Spyce of course never really got any traction to speak of. It's time for me to let it go quietly into the night and leverage someone else's framework.)

Conclusion: I think porting DBMechanic to Pylons is our best option. DBMechanic seems designed to be more flexible than the django admin app. Django would be my second choice.

Corrections? Thoughts?

Wednesday, April 02, 2008

Real Python IDEs

After reading a blog post titled "The Abysmal State of Python IDEs" (which I won't link to because it's minformative, but it's easy to google by title), I wondered how the author managed to pick such a lousy group of IDEs to try. He tried "ActiveState" (does he mean PythonWin?), DrPython, SPE, and ScrIDE, only one of which is in the top 10 google hits for Python IDE.

The google top 10 include Eric, Wing IDE, Radio Userland, SPE, PyDev, and Komodo. The Yahoo and MSN top 10s are similar. Except for Radio Userland, this is a much better group to start with, and one that in fact does include what I think are the only 3 Python IDEs worth trying.

So how does a newbie end up picking such a lousy group of IDEs to try? The only likely possibility seems to be that he went to the top google hit, the python.org wiki page. Or possibly he went off of the top MSN hit, the c2 wiki Python IDE page. Both are (rather, were) heaping wads of products that mostly weren't IDEs at all, or were IDEs for other languages that happened to include Python syntax coloring.

Syntax coloring and maybe a Run button doesn't qualify you as a Python IDE in 2008, guys. (Sorry, IDLE.) Integrated means you need to integrate something nontrivial, preferably a debugger, although gui builders can also count.

So I organized the python.org IDE page by feature set and moved the non-IDEs to the Editors page, even if a pedant would note that they were IDEs, just not really for Python. That's not what 99.9% of people are looking for when they go to a Python IDE page, so let's be useful rather than pedantic. I also elided the non-IDEs from the c2 page.

Saturday, March 15, 2008

Best new blog I discovered at PyCon [so far]

I was talking to Adam Gomaa on Thursday when Ben Bangert stopped by us and told him he had an interesting blog. "If Ben says you have a good blog, I'll have to check it out," I told Adam. "That's not what I said," Ben corrected me. "I said interesting." But it is good, and I'm glad I found it.

And regarding Adam's post on declarative layers for SQLAlchemy, check out the new-in-SA 0.4.4 declarative plugin. It's almost exactly what Adam was looking for -- a little more verbose, in keeping with the "explicit is better than implicit" Python philosophy that SA shares, but creating your own superclass that creates a PK named "id" by default is just a few lines of code if that's what you prefer.

PyCon, Saturday and Sunday

Saturday I'll be at the SQLAlchemy and State of PyPy talks. Then the board game BOF in the evening. In between, probably mostly the "hallway track."

Sunday I plan to attend "What Zope did wrong" and "Core Python Containers." (I'd also like to see the Wingware presentation in the 11:35 slot, but since I can only pick one, I guess I can just cross my fingers that this year's video recordings actually get published somewhere.)

Feel free to stop me and introduce yourself.

(If we met at a previous PyCon, I've changed my hair around a lot from year to year. The photo on this blog represents what I look like now. I should probably stick with this for a couple years so people can recognize me.)

Thursday, March 13, 2008

Introverting

I'm taking a break during the evening tutorial in my hotel room on the second floor where I can enjoy the pycon wireless signal (which seems to be working quite well now). My name is Jonathan, and I am an introvert [too]. I suspect a lot of PyCon attendees can empathize.

After the last tutorial session ends at 9:30, I'm planning to head down with my copy of Munchkin and see if anyone wants to start the board game social a day early.

Slides from Introduction to SQLAlchemy tutorial

My slides from this morning are up: http://utahpython.org/jellis/sa-intro.pdf. The about 1/3 of the class did not have SA installed yet, and the network was down. Fortunately, Mike and Jason brought 5 flash drives and by the time we got to the first exercise everyone was up and running.

This was my third time doing a three-hour SQLAlchemy tutorial. Differences from (last time) include

  • updated for the 0.4 series
  • removed almost all the SQL-layer material
  • added a section on the new relation filtering api
  • Improved the parts of the Fundamentals sections that were poorly explained
  • added a short section on the new-in-0.4 transaction management.

There wasn't a wall clock in the tutorial room, so despite making an effort to be aware of time I went 10 minutes over. Sorry, guys. :)

Jason Kirtland will be posting the slides from the Advanced SQLAlchemy tutorial soon.

Friday, March 07, 2008

Pylons: first impressions

A couple co-workers and I spent some time with Pylons yesterday, enough to get to where we started to feel productive, but not much more than that. I think there's value in a newbie's first impressions, so here are mine. I'm sure at least some of these are wrong.
  1. Poor documentation of core Pylons (Mako and SQLAlchemy are fine -- thanks, Mike). I had to use the source several times. I'm still not really sure how Routes works, although I was mostly able to make it do what I wanted. The first tutorial overcomplicated things, showing how to configure things to handle semi-obscure requirements, without explaining those requirements or simpler alternatives.
  2. Helpful community. I got most of the answers I needed pretty quickly in the #pylons freenode IRC channel.
  3. Not much black magic: if you know Python you won't be struggling with weird Pylons-only concepts. It's all modules, classes, and dicts put together in an intuitive way (at least to my way of thinking).
  4. SA (SQLAlchemy) is an amazing pleasure to use. (Okay, not just in Pylons, but I had to say it.) I have a slightly unusual schema -- the details are outside my scope here -- and SA's autoload handled it perfectly.
  5. I wrote more CRUD boilerplate than I would have liked. There is no real alternative to Django's "admin app." DBMechanic looks like it's getting close, but it's TG-and-Genshi only for now. FormAlchemy is a partial solution (I did use it) but only does html generation; you'll still write boilerplate in your controllers.
  6. Genshi appeals to me in theory but in practice its XML nature makes it feel clunky. XInclude as an alternative to template inheritance? 3 lines of xmlns per template? Mako has its own verbosity problems, e.g., having to do a def to pass a title to the parent template, but these aren't inherent to Mako's approach the way they are to Genshi's (we're XML, dammit), and Mike seems mildy interested in improving this specific example for the next release.
  7. The pyfacebook tutorial is long on throwing wads of code at you and short on explaining what's actually going on. What does facebook.check_session() do? What does the facebook_middleware do? Why? Most facebook api tutorials have this same problem. Obviously I haven't written a better one, so call me a hypocrite, but tutorial authors, please explain the why and not just the what.

Saturday, February 09, 2008

SQLAlchemy at Pycon 08

SQLAlchemy will be well-represented this year with two tutorials and a talk.

I'll be the primary instructor for the Introduction to SQLAlchemy tutorial. I just updated the pycon page with the outline of what we'll cover. The slides will be pretty similar to last time, only with more time spent on a high-level intro to ORM (object-relational mapping) for people who have little exposure to that. And of course last year 0.4 was not out.

The SQLAlchemy documentation is thorough but a little intimidating. IMNSHO, the introduction tutorial is a great way to pick up the basics and get some practice, after which everything starts to make a lot more sense.

Mike Bayer, the author of SA, will be the primary instructor for the Advanced SQLAlchemy tutorial. Jason Kirtland, one of the most prolific SA hackers besides Mike himself, will also be teaching.

At the conference itself, Mike will be presenting Sqlalchemy 0.4 and beyond. To save you digging it out of the talks page, here's the summary:

At last year's Pycon, we introduced SQLAlchemy, the Database Toolkit for Python. This year, SQLAlchemy has gained new developers, a lot more users, and has now produced SQLAlchemy 0.4. The latest series of SQLAlchemy is significantly improved from the previous, in that APIs have been greatly pared down and refined, performance has been stepped up 30-40%, and ongoing architectural and developmental improvements have made room for lots of great new features with more to come. This talk intends to describe what's new in the 0.4 series, both for current users as well as for folks who may have only had experience with our earlier versions.

Monday, January 14, 2008

Why IE rejects your cookies for no apparent reason

Seriously, WTF.

I'll summarize for those of you who are allergic to MSN knowledge base articles, although this one is fairly to-the-point:

If you implement a FRAMESET whose FRAMEs point to other Web sites on the networks of your partners or inside your network, but you use different top-level domain names... IE silently rejects cookies sent from third party sites.

This bit me today while adding facebook support to my text-based game -- I'm going the IFRAME route for fb support rather than rewrite the whole app in FBML thankyouverymuch, and yes, apparently IFRAME counts too for IE retard-mode.

What makes me cry a little inside is not the two hours spent deep in old and crufty login and cookie-setting legacy code wondering what the flaming hell was going on. No, what makes me cry is that I got screwed by a setting that will never block the bad guys, because labeling yourself a good guy is entirely voluntary. It's like someone at MS read the evil bit RFC and took it seriously.

The mind boggles.

In the meantime, if you know where your web framework's cookie code lives, do everyone a favor and patch it now to add that P3P header given in the knowledge base by default. And an option to disable it if you're obsessive-compulsive that way.