Over three years ago (!), I wrote a screen scraper to turn the Python Job Board into an RSS feed. It didn't make it across one of several server moves since then, but now I've ported it to Google's App Engine: the new unofficial python job board feed.
I'll be making a separate post on the Google App Engine business model and when it makes sense to consider the App Engine for a product. Here I'm going to talk about my technical impressions.
First, here's the source. Nothing fancy. The only App Engine-specific API used is urlfetch.
Unfortunately, even something this simple bumps up against some pretty rough edges in App Engine. It's going to be a while before this is ready for production use.
The big one is scheduled background tasks. (If you think this is important, star the issue rather than posting a "me too" comment.) Related is a task queue that would allow those scheduled tasks to easily be split into bite-size pieces, which is important for Google to allow scheduled tasks (a) without worrying about runaway processes while (b) still accomplishing an arbitrary amount of work.
If there were a scheduled task api, my feed generator could poll the python jobs site hourly or so, and store the results in the Datastore, instead of having a 1:1 ratio of feed requests to remote fetches.
While you can certainly create a cron job to fetch a certain url of your app periodically, and have that url run your "scheduled task," things get tricky quickly if your task needs to perform more work than it can accomplish in the small per-page time allocation it gets. Fortunately, I expect a scheduled task api from App Engine sooner rather than later -- Google wants to be your one stop shop, and for a large set of applications (every web app I have ever seen has had some scheduled task component) to have to rely on an external server to ping the app with this sort of workaround defeats that purpose completely.
Another rough edge is in the fetch api. Backend fetches like mine need a callback api so that a slow remote server doesn't cause the fetch to fail forever from being auto-cancelled prematurely. Of course, this won't be useful until scheduled tasks are available. I'm thinking ahead. :)
Finally, be aware that fatal errors are not logged by default. If you want to log fatal errors, you need to do it yourself. the main() function is a good place for this if you are rolling your own simple script like I am here.