Skip to main content

Google App Engine: Return of the Unofficial Python Job Board Feed

Over three years ago (!), I wrote a screen scraper to turn the Python Job Board into an RSS feed. It didn't make it across one of several server moves since then, but now I've ported it to Google's App Engine: the new unofficial python job board feed.
I'll be making a separate post on the Google App Engine business model and when it makes sense to consider the App Engine for a product. Here I'm going to talk about my technical impressions.
First, here's the source. Nothing fancy. The only App Engine-specific API used is urlfetch.
Unfortunately, even something this simple bumps up against some pretty rough edges in App Engine. It's going to be a while before this is ready for production use.
The big one is scheduled background tasks. (If you think this is important, star the issue rather than posting a "me too" comment.) Related is a task queue that would allow those scheduled tasks to easily be split into bite-size pieces, which is important for Google to allow scheduled tasks (a) without worrying about runaway processes while (b) still accomplishing an arbitrary amount of work.
If there were a scheduled task api, my feed generator could poll the python jobs site hourly or so, and store the results in the Datastore, instead of having a 1:1 ratio of feed requests to remote fetches.
While you can certainly create a cron job to fetch a certain url of your app periodically, and have that url run your "scheduled task," things get tricky quickly if your task needs to perform more work than it can accomplish in the small per-page time allocation it gets. Fortunately, I expect a scheduled task api from App Engine sooner rather than later -- Google wants to be your one stop shop, and for a large set of applications (every web app I have ever seen has had some scheduled task component) to have to rely on an external server to ping the app with this sort of workaround defeats that purpose completely.
Another rough edge is in the fetch api. Backend fetches like mine need a callback api so that a slow remote server doesn't cause the fetch to fail forever from being auto-cancelled prematurely. Of course, this won't be useful until scheduled tasks are available. I'm thinking ahead. :)
Finally, be aware that fatal errors are not logged by default. If you want to log fatal errors, you need to do it yourself. the main() function is a good place for this if you are rolling your own simple script like I am here.







Comments

jek said…
This is great! Thank you!
Ksenia said…
If there were a scheduled task api, my feed generator could poll the python jobs site hourly or so, and store the results in the Datastore, instead of having a 1:1 ratio of feed requests to remote fetches.

You could probably store the results in the Datastore, along with the timestamp, and update it during request every hour or so ("if the data older than 1 hour, than fetch again"). That would make some requests taking more time, but at most once in an hour.
Jonathan Ellis said…
I considered doing that, but the complexity isn't worth it here.

The complexity is, what if two requests come in at the same time? You need some kind of locking to prevent duplicate entries; query-before-insert is still race-condition prone. Locking isn't provided by the GAE either, so you'd have to hack something using Datastore. (I'm assuming that the transactional semantics are strong enough that this is even possible; I haven't checked.)

So even for a simple case like this one, it's not as easy to work around the lack of scheduled tasks as you'd think. It really is a critical part of any non-toy application environment.
Kevin H said…
This is a really useful feed. Thanks for making it.

It would be helpful, though, if the location information of the positions wasn't lost in the translation. Any possibility of adding this?
Jonathan Ellis said…
Well, I posted the source. Patches accepted. :)

Popular posts from this blog

PyCon Python IDE review

I presented an IDE review at PyCon last Friday. It was basically a re-review of what I thought were the 3 most promising IDEs from the Utah Python User Group IDE review , to which I added SPE, which was by far the most popular of the ones we left out that time. The versions reviewed are: PyDev 1.0.2 SPE 0.8.2.a Komodo 3.5.2 Wing IDE 2.1 beta 1 I'd intended to base my presentation around a comparison of writing a smallish program in each of the IDEs, but the more I tried to make this not suck, the more I realized it was a losing proposition. Instead, I decided to try to focus on the features in each that most set them apart from the others (both positive and negative); this seemed more likely be useful. (I did a new feature matrix for this review, which is included after my comments. The slides I used are also up, at http://utahpython.org/jellis/pycon-ides.pdf , but aren't very useful absent video of the presentation itself. Hence this post.) PyDev PyDev has g...

Why PHP sucks

(July 8 2005) Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes: All environments have warts, so PHP is no worse than anything else in this respect I can work around PHP's problems, ergo they are not really problems You aren't experienced enough in PHP to judge it yet As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant. Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP. Finally, as I noted in my original introduction, with PHP, ...

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren...