Friday, August 29, 2008

App Engine conclusions

Having been eyeball deep in App Engine for a while, evaluating it for a project at work and putting together a presentation for the utah open source conference, I've reluctantly concluded that I don't like it.  I want to like it, since it's a great poster child for Python.  And there are some bright spots, like the dirt-simple integration with google accounts.  But it's so very very primitive in so many ways.  Not just the missing features, or the "you can use any web framework you like, as long as it's django" attitude, but primarily a lot of the existing API is just so very primitive.  

The DataStore in particular feels like a giant step backwards from using a traditional database with a sophisticated ORM.  Sure, it can scale if you use it right, but do you really know what that entails?

Take the example of simple counting of objects.  There's a count() method, but in practice, it's so slow you can't use it.  Denormalize with a .count property?  Yeah, that doesn't scale either: what you really need is a separate, sharded Counter class.  And yes, sharding is very, very manual.  (See slides 18-23 in the link there, and the associated video starting about 19:00.)  

You can't perform joins in GQL.  Or subselects.  Or call functions, aggregate or otherwise.  EVERYthing you are interested needs to be pre-computed.  (Or computed by hand client-side, which is so slow it's barely an option at all.)  I can extrapolate from this to my experience in production schemas and it's not pretty.

Of course, you also lose any ability to write declarative, set-based code, which is demonstrably less error-prone than the imperative alternative.  Take a simple example from my demo app.  Marking a group of todo items finished is four statements:

items = TodoItem.get_by_id(
  [int(id) for id in request.POST.getlist('item_id')])
for item in items:
  item.finished =

Compare this with SQL:

cursor.execute("update todo_items set finished = CURRENT_TIMESTAMP where id in %s",
             ([int(id) for id in request.POST.getlist('item_id')]))
Scalability is great but taking a big hit to back-end productivity is too high a price for all but a few applications.  GAE is still young, so maybe Google will improve things, but their attitude so far seems to be "we know how to scale so shut up and do it the hard way."  I hope I am wrong.

App Engine slides, code

My App Engine 101 slides and code are up now.

Bad news: my macbook pro did not work with the projector, period.

Good news: I have seen it do this before (in a room with several mac experts -- it was not user error) and brought a backup laptop.

Bad news: I forgot to include the django beta1 framework in my code upload, so I told people to just download it. But beta2 was out, and didn't work with the version of App Engine Helper I had. (It looks like r58 fixes this.) Manual poking about the django download site ensued until I got a new zip uploaded.

Good news: the conference organizers liked it anyway and asked me to present a second time later in the day. Everything just worked the second time around.

Monday, August 25, 2008

Google App Engine at the Utah Open Source Conference

App Engine is probably the biggest thing to happen to Python this year, so of course I volunteered to give a presentation on it at at the Utah Open Source Conference. (I'm scheduled for Friday, Aug 29, at 10:00 AM.) Last year's conference was a big success, so I'm looking forward to an even better experience this year.

Here's the abstract I submitted, before they blew away my paragraph breaks:

Google launched the App Engine service earlier this year to immense interest from the web development community. App Engine allows running applications on Google infrastructure, including BigTable, Google's non-relational, massively scalable database.

App Engine is appealing both at the low end, where small shops don't want to have to deal with hardware procurement and systems administration, and at the high end, where the kind of "instant scaling" App Engine promises to deal with bursty traffic is the holy grail of infrastructure planning. This tutorial will cover the basics of App Engine development, including development and deployment of a simple application.

Please sign up for an App Engine account and download the SDK ahead of time so we can jump right in to the code. Basic Python knowledge will be assumed.

After I submitted the proposal, I found out that all presentations are going to be 60 minutes long. That is not much time if we're going to do hands-on work, but you retain so much more by doing than you do merely from watching that I don't consider it optional. So seriously, come with the SDK installed. Those who do not, can look over the shoulders of those who do.

If you don't know Python and you're a last minute kind of person, you might want to attend Matt Harrison's talk the day before, 90% of the Python you need to know. Matt has presented several times at the Utah Python User Group as well as PyCon.

Bonus tip: if you can't make it to the UTOSC, the two best talks on App Engine are Rapid Development with Python, Django, and Google App Engine and Building Scalable Web Applications with Google App Engine. My presentation will cover similar material to the first of these.

Friday, August 15, 2008

A reminder

Now that I've been doing Python full time again for a while it's easy to forget how magical it can be.

Last night I got an IM from a friend of a friend asking for (a) a recommendation for a Python book and (b) advice on writing a screen scraper. I pointed him to Dive Into Python and BeautifulSoup. Just now he IMed me again, "Hey, thanks for the tip. I ended up writing a screen scraper that I hadn't completed in 2 days in Groovy in about 20 minutes last night in Python with BeautifulSoup. So thanks, you got another python convert."

I love my job.