Spyced

Posts

Showing posts from 2008

CouchDB: not drinking the kool-aid

This is my attempt to clear up some misconceptions about CouchDB and point out some technical details that a lot of people seem to have overlooked. For the record, I like Damien Katz's blog, he seems like a great programmer, and Erlang looks cool. Please don't hurt me. First, and most important: CouchDB is not a distributed database. BigTable is a distributed database. Cassandra and dynomite are distributed databases. (And open source, and based on a better design than BigTable. More on this in another post.) It's true that with CouchDB you can "shard" data out to different instances just like you can with MySQL or PostgreSQL. That's not what people think when they see "distributed database." It's also true that CouchDB has good replication, but even multi-master replication isn't the same as a distributed database: you're still limited to the write throughput of the slowest machine. Here are some reasons you should think tw...

RackLabs

Today marks one month that I've been working for Rackspace's RackLabs with the Mosso group in San Antonio, Texas. (Anyone want to start a Python group? The closest one is in Austin.) It's kind of a gentle introduction to big company culture for me; at around 2,000 employees, Rackspace is easily ten times as large as any other company I've worked for, and 100 times as large as most. Mosso is a lot smaller and RackLabs itself is smaller still, but I still had to go to five days (!) of corporate orientation. Other than that, though, we're pretty much left alone by our corporate parent. To start with, I'm working on Mosso's Cloud Files , which is basically an S3 competitor. Cloud Files is similar to the work I did at Mozy, but there are a lot of technical differences. Some are driven by Cloud Files being more of a general purpose storage engine than the one I wrote for Mozy; others stem from the Cloud Files authors being Twisted fans. Strange coincidence:...

Frustrated with git

I'm a little over a week into a git immersion program. Let me just say that git's reputation of being a little arcane (okay, more than a little) and having a steep learning curve is 100% deserved. One thing that would mitigate things is if git would give you feedback when you tell it to do nonsense. But it doesn't. Here's me trying to get machine B to always merge the debug branch from machine A when I pull: 232 git config branch.debug.remote origin 234 git config branch.master.remote origin 236 git config branch.master.remote origin/debug All of these commands completed silently. None accomplished what I wanted. In the end I renamed master to old and debug to master to avoid having to fight it. Then I blew away my working copy and re-cloned because those config statements had created a new problem that I didn't know how to undo. I'm sure the git virtuosos out there will know what was wrong. That's not the point. The point is that the tool gave me n...

FormAlchemy 1.1: admin app, composite key support

FormAlchemy 1.1 is out, so you no longer need to run trunk to get the admin app goodness -- now with i8n support. We also added support for all composite primary keys, and most composite foreign keys. (The distinction is, rendering an object depends on the PK, but loading relations depends on FKs.) Gael also added the fsblob extension , which allows storing blobs on the filesystem and the path in the database. (FormAlchemy can handle blob-in-the-db out of the box.) (I previously blogged about basic FormAlchemy and the admin app , which are still good introductions.) FormAlchemy has pretty good documentation . The most important page is form generation ; instructions to configure the admin app are here .

An unusual approach to log parsing

I saw an interesting article about logging today on reddit, and it struck a nerve with me, specifically how most text logs are not designed for easy parsing. (I don't agree with the second point, though -- sometimes logging is tracing, or perhaps more accurately, sometimes tracing is logging.) We had a lot of log and trace data at Mozy, several GB per day. A traditional parsing approach would have been tedious and prone to regressions when the messages generated by the server changed. So Paul Cannon, a frustrated lisp programmer , designed a system where the log API looked something like this: self.log('command_reply(%r, %r)' % (command, arg)) Then the log processor would define the vocabulary ( command_reply , etc.) and, instead of parsing the log messages, eval them! This is an approach that wouldn't have occurred to me, nor would I have thought of using partial function application to simplify passing state (from the log processor and/or previous log entrie...

A small admin app for Pylons

I said that it would be possible to build a django-style admin interface for Pylons using FormAlchemy . (That is, generate a UI for basic CRUD operations for all your models, with no further configuration necessary.) I have a proof of concept in FA svn; it's missing some obvious features like internationalization so there is no official release yet. But the basics are there, so in the meantime, if you'd like to kick the tires , just install FA from svn and give it a try. Here are some screenshots from a pylons app incorporating models from the FA test suite. (The admin controller is fully customizable using standard FA (and Pylons) techniques, but these are what you'd see out-of-the-box.) Index: Order page: Creating a new Order: Deleting an Order: The User page: Editing a User instance: Documentation on using and customizing the pylons admin app is here .

FormAlchemy 1.0

A little background: a few months ago, I went looking for a web framework that was good at automating CRUD (create/retrieve/update/delete) against an existing database schema. I tried django but its database introspection abilities are beyond feeble, and django-sqlalchemy was not mature enough. I tried dbmechanic but its dozen-plus dependencies, most of which were alpha-quality, gave me pause; so did its basic architecture on top of toscawidgets, which I think is The Wrong Way to build web apps. (I understand that the former problem has since been reduced; the latter has not.) So, I went back to option #3, FormAlchemy . I knew SQLAlchemy could reflect very hairy schemas indeed, and what it could not reflect, it could certainly represent with a little manual help. And FormAlchemy was a decent start to automating CRUD with SA models. I added the ability to represent relations, automatic syncing of form input back to SA objects, Grid support, and a test suite. Then Gael came alo...

Available

Feature50 is winding down now that CEO Ben Galbraith has accepted a job offer elsewhere. So, I'm interested in exploring my options, specifically, opportunities to build out the technology for a start-up working in concert with a strong business CEO. I've done this twice now. Technical ability I am a senior developer specializing in back-end technologies. At Mozy , where I was employee #2, I wrote a distributed file repository that stores petabytes of data, an amount comparable to Amazon's S3. I have 8 years of experience with PostgreSQL. I know how to design for scale, and how to find and remove bottlenecks. I am not afraid of diving into a new code base; I took over as maintainer of the Spyce web framework and the FormAlchemy toolkit, and I have contributed features or patches to SQLAlchemy, Pylons, and Jython, among others. Soft skills I enjoy building and working with a team. At Feature50 I am responsible for technical interviews, and personally recruited fiv...

Blog Day recommendations

As with many things in the blogging echo chamber, blog day takes itself a little too seriously. But it's impossible not to love an excuse to talk about some of my favorite blogs that don't seem to have as much exposure as I think they deserve: Theo Schlossnagle , CEO of OmniTI, a scalability and performance consulting company. His best posts deal with scalability at the ops level . His book is good, too. Greg Linden , ex-Amazon engineer, ex-Findory founder, current MS Live Labs employee. He likes to post analyses of interesting CS talks and papers, particularly in the area of collective intelligence. Greg stays very on-topic so the most recent posts are about as representative as any. Chris Siebenmann writes about life as a professional sysadmin . He also sometimes blogs about python . Josh Berkus , PostgreSQL core team member, mostly blogs about current events in the database world, but every once in a while he writes a must-read post about database design . Google th...

App Engine conclusions

Having been eyeball deep in App Engine for a while, evaluating it for a project at work and putting together a presentation for the utah open source conference, I've reluctantly concluded that I don't like it. I want to like it, since it's a great poster child for Python. And there are some bright spots, like the dirt-simple integration with google accounts. But it's so very very primitive in so many ways. Not just the missing features , or the "you can use any web framework you like, as long as it's django" attitude, but primarily a lot of the existing API is just so very primitive. The DataStore in particular feels like a giant step backwards from using a traditional database with a sophisticated ORM . Sure, it can scale if you use it right, but do you really know what that entails? Take the example of simple counting of objects . There's a count() method, but in practice, it's so slow you can't use it. Denormalize with a .count pr...

App Engine slides, code

My App Engine 101 slides and code are up now. Bad news: my macbook pro did not work with the projector, period. Good news: I have seen it do this before (in a room with several mac experts -- it was not user error) and brought a backup laptop. Bad news: I forgot to include the django beta1 framework in my code upload, so I told people to just download it. But beta2 was out, and didn't work with the version of App Engine Helper I had. (It looks like r58 fixes this.) Manual poking about the django download site ensued until I got a new zip uploaded. Good news: the conference organizers liked it anyway and asked me to present a second time later in the day. Everything just worked the second time around.

Google App Engine at the Utah Open Source Conference

App Engine is probably the biggest thing to happen to Python this year, so of course I volunteered to give a presentation on it at at the Utah Open Source Conference . (I'm scheduled for Friday, Aug 29, at 10:00 AM.) Last year's conference was a big success, so I'm looking forward to an even better experience this year. Here's the abstract I submitted, before they blew away my paragraph breaks: Google launched the App Engine service earlier this year to immense interest from the web development community. App Engine allows running applications on Google infrastructure, including BigTable, Google's non-relational, massively scalable database. App Engine is appealing both at the low end, where small shops don't want to have to deal with hardware procurement and systems administration, and at the high end, where the kind of "instant scaling" App Engine promises to deal with bursty traffic is the holy grail of infrastructure planning. This tutorial will ...

A reminder

Now that I've been doing Python full time again for a while it's easy to forget how magical it can be. Last night I got an IM from a friend of a friend asking for (a) a recommendation for a Python book and (b) advice on writing a screen scraper. I pointed him to Dive Into Python and BeautifulSoup . Just now he IMed me again, "Hey, thanks for the tip. I ended up writing a screen scraper that I hadn't completed in 2 days in Groovy in about 20 minutes last night in Python with BeautifulSoup. So thanks, you got another python convert." I love my job.

SQLAlchemy-Migrate for dummies

I'm gave sqlalchemy-migrate a try today. I like it, and I'm going to keep using it. The one downside is that it's a bit hard to find "the least you need to know" in the documentation, especially if you lean old-school like me and prefer to write your upgrade scripts in raw sql. So here's my stab at it. Create a "repository" for upgrade scripts: migrate create path/to/upgradescripts "comment" Create your manage script. If you have development/production dbs with different connection urls, create two scripts with the same repository but different urls: migrate manage dbmanage.py --repository=path/to/upgradescripts --url=db-connection-url For each database, create the Migrate metadata (a migrate_version table): ./dbmanage.py version_control Create an upgrade script. This will create a script [next version number]-[database type]-upgrade.sql in the "versions" subdirectory of your "repository." That's a...

How to tell when you're successful

You're successful when someone tries to get a cheap clone of your site done on a cheap-labor code monkey site. I'm flattered , I think. (Although I'd be more flattered if it were a good code monkey site.)

Brief review of the Matias Half Keyboard et al

I ended up buying four pieces of equipment to help deal with being temporarily one-handed : the Matias half keyboard , the X-keys foot pedal (cheaper than the Kinesis pedals, which got lukewarm reviews on Amazon), the Keyspan PR-US2 Presentation Remote , and the Pacific Outdoors 17-LC100 Folding Recliner . The good: I'm very pleased with the recliner and modestly happy with the remote. I got the recliner to take naps in; the brace on my arm didn't really accomodate lying down. This $80 recliner compares well with zero gravity recliners costing over 10x as much. (I've used two of the expensive variety; a BackSaver and one whose brand I don't recall.) The only downside is you can either sit up, or recline fully; there is supposedly a way to adjust the recline angle, but it doesn't really work. Expensive zero gravity recliners can all reliably lock at any angle you like. The remote mostly worked as a mouse substitute that I could use with my immobilized right han...

One-handed typing?

I separated my right shoulder so that arm is going to be out of commission for a while. (I am right-handed.) I'm managing about 25 wpm with one hand, or about 1/4 my normal speed. This is frustrating. The Handkey Twiddler has been out of production for a while. The BAT is not OS X compatible. Anyone tried the Half Qwerty keyboard? Are there other good options for under, say, $300? (I found several very niche products for significantly more.) I do plan to try voice recognition for email and IM but I can't see that working very well for code.

Jython Notes

I've been getting back into the Jython codebase this last week. The last time I submitted a Jython patch was in the beginning of 2004, so it's been a while. Things have changed... Jython is finally requiring Java 5 for the next release, which means the usual improvements, but especially good use of annotations. Here's some notes from my puttering around (mostly dragging Jython's set module up to compatibility with CPython 2.5's): Expect Eclipse to be slightly confused. (Lots of "errors.") This is normal. Use ant to build. ant regrtest is handy. run it before you start making changes so you know what's already broken in trunk. (At least between releases, jython does not appear to be religious about "no tests shall fail." But as a new developer you should make "no additional tests should fail" your motto.) Subjective impression: Jython re performace is a bit slow. Jython uses its own re implementation predating the Java re...

Quick tip for debugging with Jython

Currently, Jython ships with the pdb debugger module from Python 2.3. Unfortunately the 2.3 pdb is primitive even by command-line debugger standards. (For instance, if the program you are debugging throws an exception, it will take pdb down with it. Seriously. Did anyone actually use this thing?) Fortunately all you have to do to get a much better experience is grab pdb.py, bdb.py, and cmd.py (for good measure) from a 2.5 CPython installation and run against that instead. I've only tested this with Jython trunk but I think it should Just Work with the 2.2 release, too.

IDE update

Last night the Utah Python User Group held an editor/IDE smackdown. I'm not going to write an exhaustive summary, but here are some highlights: ViM's OmniComplete is actually pretty decent. Calltip support in the GUI is also good. (GUI? ViM? Yeah, weird.) Emacs completion, from Rope, is also good. Emacs's refusal to make any concession to GUIs though keeps things clunky. Not that it isn't great that Everything Works over plain ssh; that's fine, but going through classic Emacs buffers for docstrings or completion means everything takes more keystrokes than it should while being less useful than having that information Always On. Rope also gives Emacs refactoring support that works surprisingly well. PyDev still sees a big win from the Eclipse platform. Specifically, even though Subclipse and Subversive are a bit weak compared to the gold standard (that would be TortoiseSVN), they are much better than what you get with Komodo or Wing. Now that I am on OS X (...

How to piss off your customers in two easy steps

Don't communicate with them Treat them like they owe you something Google is off to a good (bad?) start with both of these in its management of the App Engine release. Of the 120+ issues logged by beta testers, a few have been closed as wontfix or duplicate; most have no response at all from the App Engine team. I can't think of any other company that I've filed an issue with that took that long to get back to me. The good ones get back within hours. The one exception I have seen is for the urllib issue , where gu...@python.org, presumably Guido, wrote Providing a urllib replacement implemented on top of urlfetch shouldn't be particularly hard. If someone is willing to produce one, I'd be happy to review it and, if it passes muster, try to get it added. Paraphrased: "maybe if you do our work for us we'll consider it." WTF! This isn't OSS, where "if you want something, do it yourself" is at least a semi-valid response. Ap...

The business case for Google App Engine

App Engine sure has caused a stir . Some of the competition is already scared , with reason. But who is App Engine's real competition? In a lot of ways, App Engine is in a class by itself. It competes on the high end with Amazon Web Services . But it also competes on the low end with every shared host out there. And thanks to the integration of Google authentication and the application directory you could also make a case that in an orthogonal way it competes with Facebook's application API. At the low end, App Engine is a big deal for Python developers and anyone else who is allergic to PHP. Historically, you've really had to look hard for low end hosting that offered anything else. And as everyone who has given products away to colleges knows, Free is a fantastic hook to get developers to try out your platform. Once it's open for all, App Engine is going to become the preferred option for developers with the itch to write a toy or proof of concept and show...

Language popularity, App Engine - style

Just for fun, here's the number of stars (interested people) for the different language-support feature requests for Google App Engine: Perl: 85 Java: 69 Ruby: 67 PHP: 23 C#: 11 jvm, not just java: 7 Common Lisp: 5 Update: Perl is stuffing the ballot box :)

Google App Engine: Return of the Unofficial Python Job Board Feed

Over three years ago (!), I wrote a screen scraper to turn the Python Job Board into an RSS feed. It didn't make it across one of several server moves since then, but now I've ported it to Google's App Engine: the new unofficial python job board feed . I'll be making a separate post on the Google App Engine business model and when it makes sense to consider the App Engine for a product. Here I'm going to talk about my technical impressions. First, here's the source . Nothing fancy. The only App Engine-specific API used is urlfetch. Unfortunately, even something this simple bumps up against some pretty rough edges in App Engine. It's going to be a while before this is ready for production use. The big one is scheduled background tasks . (If you think this is important, star the issue rather than posting a "me too" comment.) Related is a task queue that would allow those scheduled tasks to easily be split into bite-size pieces, which i...

My half-baked thoughts on Python web frameworks

I have been lucky to be able to fill our recent open positions with people who know Python as well as Java so now we are up to half the (6 person) company in that category and preferring Python, and 2 of the others have played with Python and liked it at least well enough to not object. So the boss has conceded that it makes sense to go the Python route for our next project. We're going to be doing a web, "next gen" version of our existing client-server project, which is mostly simple CRUD but does have 1000+ tables in its current incarnation. So we really need something that can autogenerate 90+% of the CRUD or we will go insane. The trouble is, I still don't really like any of the Python web options 100%. (I like the web options in other languages less, but I'm a perfectionist.) Django is well documented, its admin app is something everyone else envies, and newforms looks decent, but the ORM blows and I'm not fond of the template engine either. (...

Real Python IDEs

After reading a blog post titled "The Abysmal State of Python IDEs" (which I won't link to because it's minformative, but it's easy to google by title), I wondered how the author managed to pick such a lousy group of IDEs to try. He tried "ActiveState" (does he mean PythonWin?), DrPython, SPE, and ScrIDE, only one of which is in the top 10 google hits for Python IDE. The google top 10 include Eric, Wing IDE, Radio Userland, SPE, PyDev, and Komodo. The Yahoo and MSN top 10s are similar. Except for Radio Userland, this is a much better group to start with, and one that in fact does include what I think are the only 3 Python IDEs worth trying. So how does a newbie end up picking such a lousy group of IDEs to try? The only likely possibility seems to be that he went to the top google hit, the python.org wiki page. Or possibly he went off of the top MSN hit, the c2 wiki Python IDE page. Both are (rather, were) heaping wads of products that mostly we...

Best new blog I discovered at PyCon [so far]

I was talking to Adam Gomaa on Thursday when Ben Bangert stopped by us and told him he had an interesting blog. "If Ben says you have a good blog, I'll have to check it out," I told Adam. "That's not what I said," Ben corrected me. "I said interesting ." But it is good, and I'm glad I found it. And regarding Adam's post on declarative layers for SQLAlchemy , check out the new-in-SA 0.4.4 declarative plugin. It's almost exactly what Adam was looking for -- a little more verbose, in keeping with the "explicit is better than implicit" Python philosophy that SA shares, but creating your own superclass that creates a PK named "id" by default is just a few lines of code if that's what you prefer.

PyCon, Saturday and Sunday

Saturday I'll be at the SQLAlchemy and State of PyPy talks. Then the board game BOF in the evening. In between, probably mostly the "hallway track." Sunday I plan to attend "What Zope did wrong" and "Core Python Containers." (I'd also like to see the Wingware presentation in the 11:35 slot, but since I can only pick one, I guess I can just cross my fingers that this year's video recordings actually get published somewhere.) Feel free to stop me and introduce yourself. (If we met at a previous PyCon, I've changed my hair around a lot from year to year. The photo on this blog represents what I look like now. I should probably stick with this for a couple years so people can recognize me.)

Introverting

I'm taking a break during the evening tutorial in my hotel room on the second floor where I can enjoy the pycon wireless signal (which seems to be working quite well now). My name is Jonathan, and I am an introvert [too]. I suspect a lot of PyCon attendees can empathize. After the last tutorial session ends at 9:30, I'm planning to head down with my copy of Munchkin and see if anyone wants to start the board game social a day early.

Slides from Introduction to SQLAlchemy tutorial

My slides from this morning are up: http://utahpython.org/jellis/sa-intro.pdf . The about 1/3 of the class did not have SA installed yet, and the network was down. Fortunately, Mike and Jason brought 5 flash drives and by the time we got to the first exercise everyone was up and running. This was my third time doing a three-hour SQLAlchemy tutorial. Differences from (last time) include updated for the 0.4 series removed almost all the SQL-layer material added a section on the new relation filtering api Improved the parts of the Fundamentals sections that were poorly explained added a short section on the new-in-0.4 transaction management. There wasn't a wall clock in the tutorial room, so despite making an effort to be aware of time I went 10 minutes over. Sorry, guys. :) Jason Kirtland will be posting the slides from the Advanced SQLAlchemy tutorial soon.

Pylons: first impressions

A couple co-workers and I spent some time with Pylons yesterday, enough to get to where we started to feel productive, but not much more than that. I think there's value in a newbie's first impressions, so here are mine. I'm sure at least some of these are wrong. Poor documentation of core Pylons (Mako and SQLAlchemy are fine -- thanks, Mike ). I had to use the source several times. I'm still not really sure how Routes works, although I was mostly able to make it do what I wanted. The first tutorial overcomplicated things, showing how to configure things to handle semi-obscure requirements, without explaining those requirements or simpler alternatives. Helpful community. I got most of the answers I needed pretty quickly in the #pylons freenode IRC channel. Not much black magic: if you know Python you won't be struggling with weird Pylons-only concepts. It's all modules, classes, and dicts put together in an intuitive way (at least to my way of thinking). ...

SQLAlchemy at Pycon 08

SQLAlchemy will be well-represented this year with two tutorials and a talk. I'll be the primary instructor for the Introduction to SQLAlchemy tutorial. I just updated the pycon page with the outline of what we'll cover. The slides will be pretty similar to last time , only with more time spent on a high-level intro to ORM (object-relational mapping) for people who have little exposure to that. And of course last year 0.4 was not out. The SQLAlchemy documentation is thorough but a little intimidating. IMNSHO, the introduction tutorial is a great way to pick up the basics and get some practice, after which everything starts to make a lot more sense. Mike Bayer, the author of SA, will be the primary instructor for the Advanced SQLAlchemy tutorial. Jason Kirtland, one of the most prolific SA hackers besides Mike himself, will also be teaching. At the conference itself, Mike will be presenting Sqlalchemy 0.4 and beyond . To save you digging it out of the talks page, he...

Why IE rejects your cookies for no apparent reason

Seriously, WTF . I'll summarize for those of you who are allergic to MSN knowledge base articles, although this one is fairly to-the-point: If you implement a FRAMESET whose FRAMEs point to other Web sites on the networks of your partners or inside your network, but you use different top-level domain names... IE silently rejects cookies sent from third party sites. This bit me today while adding facebook support to my text-based game -- I'm going the IFRAME route for fb support rather than rewrite the whole app in FBML thankyouverymuch, and yes, apparently IFRAME counts too for IE retard-mode. What makes me cry a little inside is not the two hours spent deep in old and crufty login and cookie-setting legacy code wondering what the flaming hell was going on. No, what makes me cry is that I got screwed by a setting that will never block the bad guys, because labeling yourself a good guy is entirely voluntary . It's like someone at MS read the evil bit RFC and took...