Having been eyeball deep in App Engine for a while, evaluating it for a project at work and putting together a presentation for the utah open source conference, I've reluctantly concluded that I don't like it. I want to like it, since it's a great poster child for Python. And there are some bright spots, like the dirt-simple integration with google accounts. But it's so very very primitive in so many ways. Not just the missing features, or the "you can use any web framework you like, as long as it's django" attitude, but primarily a lot of the existing API is just so very primitive.
The DataStore in particular feels like a giant step backwards from using a traditional database with a sophisticated ORM. Sure, it can scale if you use it right, but do you really know what that entails?
Take the example of simple counting of objects. There's a count() method, but in practice, it's so slow you can't use it. Denormalize with a .count property? Yeah, that doesn't scale either: what you really need is a separate, sharded Counter class. And yes, sharding is very, very manual. (See slides 18-23 in the link there, and the associated video starting about 19:00.)
You can't perform joins in GQL. Or subselects. Or call functions, aggregate or otherwise. EVERYthing you are interested needs to be pre-computed. (Or computed by hand client-side, which is so slow it's barely an option at all.) I can extrapolate from this to my experience in production schemas and it's not pretty.
Of course, you also lose any ability to write declarative, set-based code, which is demonstrably less error-prone than the imperative alternative. Take a simple example from my demo app. Marking a group of todo items finished is four statements:
items = TodoItem.get_by_id( [int(id) for id in request.POST.getlist('item_id')]) for item in items: item.finished = datetime.now() item.put()
Compare this with SQL:
cursor.execute("update todo_items set finished = CURRENT_TIMESTAMP where id in %s", ([int(id) for id in request.POST.getlist('item_id')]))Scalability is great but taking a big hit to back-end productivity is too high a price for all but a few applications. GAE is still young, so maybe Google will improve things, but their attitude so far seems to be "we know how to scale so shut up and do it the hard way." I hope I am wrong.
Comments
(The SQL example does not use string interpolation; most DB-API drivers use %s as the bind variable marker, too. Potentially confusing, but there's a couple syntactical clues to this even if you didn't know.)
It is not just about code you don't write, it is about code you need to later read and comprehend.
session.query(TodoItem).\
filter(TodoItem.id.in_(request.POST.getlist('item_id'))).\
update(finished=func.current_timestamp())
SQL still wins for me.
Hopefully Google wakes up and puts less of an emphasis on Django and more on getting as many modules as possible to work in the standard library. There is still hope for appengine.
This would open up the Python community to some real possibilities.
But I do not want to be harsh. It would appear that app engine at this stage is a proof of concept if nothing else to see what the initial response is to it.
Amen!
Something like this (I'm assuming we already have the items at hand):
items = TodoItem.get_by_id(
[int(id) for id in request.POST.getlist('item_id')])
b = batchUpdater(items)
b.finished = datetime.now()
b.put()
Would that be better?
After all the SQL statement you issue is the syntactic sugar of updating this data in the DB. The DB will (eventually and the lowest level) go through the ids one by one and update them. The syntax make it cleaner for you.
If you had a similar syntax here, it wouldn't be such a problem, plus, such a wrapper can better manipulate the back-end in a batch operation fashion.
Microsoft has LINQ in the latest version of .NET which makes the syntax (in most cases) easier when writing code but in compilation will generate the same ugly code one would do without LINQ. AFAIK, LINQ is used to query data, not update it, but you get the idea.
I've inherited the mess that is "let's just put everything in a database and worry about scaling later" from initial implementors, and it is /always/ a mistake to architect that way. Well, unless your plan is to fail. Because 100% guaranteed, if you create something that becomes popular, the single massive centralized database will cause you endless pain and torment until you partition it, at which point *YOU LOSE SUBSELECTS, JOINS, AGGREGATE FUNCTIONS et al ANYWAY*.
GQL is the right approach. Learning to architect systems that don't rely so much on those SQL-inspired crutches is the future. And the future is lovely.
(And it's worth pointing out that even if you're writing an app that will eventually need to scale to millions of users, if you can't get the first version out and useful and serving thousands of users out before your competition, you still lose.)
But regarding the SQL and Datastore example. Actually reading both of the snippets requires more or less same amount of effort. Datastore snippet consists of 25 words, whereas SQL consists of 23 words.
I know I'm a little late to the game here.. but here goes..
Google has inspired me to pick up Python again and I've been browsing the web for info on some of the web frameworks available. I'm mainly interested in Django/Pylons/web2py. I was leaning more towards Pylons but I also like how easily web2py integrates with GAE, and how easily you can pull your app back out of GAE.
Have you used web2py at all? If so what are you thoughts about it and would you recommend it over Pylons/Django?
Off of GAE I still prefer pylons because of the relatively good sqlalchemy integration (see my project formalchemy).
I think designing an app so you can pull it out of GAE is not worth what you give up. Choose your platform and then play to its strengths instead of limiting yourself to a lowest common denominator.
(I really like slicehost as an option for the more traditional route -- cheap starter plan, painless expansion of your vm when you need it, and good management tools. Disclaimer: I work for rackspace, which recently bought slicehost.)