Skip to main content


Showing posts from 2007

Scala: first impressions

I'm reading the prerelease of the Scala book , since I'm working for a heavily Java-invested organization now and programming in Java feels like running a race with cement shoes. Politically, Jython doesn't seem like an option; Scala might be an easier sell. Here's some of my impressions going through the book. Scala is a decent scripting language [updated this section thanks to comments from anonymous, Jörn, and Eric.] Here's how you can loop through each line in a file: Python: import sys for line in open(sys.argv[0]): print line Scala: import Source.fromFile(args(0)).getLines.foreach(println) The scala interpreter also has a REPL , which is nice for experimenting. This article has more examples of basic scripting in scala. Getters and setters One thing I despise in Java is the huge amount of wasted lines of code dedicated to getFoo and setFoo methods. Sure, your IDE can autogenerate thes

Troubleshooting the ps3 wireless network connection, including error 80130128

My father got a ps3 for Christmas, but ran into some problems getting it on his wireless network. The first one was "connection error 80130128" after configuring it to use DHCP. I couldn't google anything useful about this; just a few other hapless victims asking if anyone had any ideas. Fortunately Dad had his laptop there too and noticed Windows complaining that two machines on the network were both using the same IP. So, over the phone, I walked him through setting up the ps3 with a static address: on his laptop, run -> cmd ipconfig Read the "gateway" ip. Put that into his browser to go to his router's admin page Find the DHCP settings for his router to see what range of IPs it hands out; pick one outside that range Set up the ps3 with that IP, the router IP as primary dns, and an opendns server as secondary This made the connection test happy. But when he tried to go to the playstation store, it gave a DNS error. If he repeated the connection te

Reed-Solomon libraries

If you want to run a multi-petabyte storage system then you don't want to do it with Raid 5 or Raid 6 ; with modern disks' ~3% per year failure rate , that's 300 a year when you have 10000 disks and the odds start to get pretty good (relatively speaking) that you'll face permanent data loss at some point when you lose a third disk from an array while two are rebuilding. And of course monitoring and replacing disks in lots of small arrays is manpower-intensive, which to investors translates as "expensive." You probably don't want to go with triplication , either; disks are cheap, but not so cheap that you want to triple your hardware costs unnecessarily. While storing multiple copies of frequently used data is good, all your data probably isn't "frequently used." What is the solution? As it turns out, Raid is actually a special case of Reed-Solomon encoding , which lets you specify any degree of redundancy you want. You can be safer th

Utah Data Recovery

About three years ago (so pre-Mozy and definitely pre-Mac Mozy) my brother had his powerbook hard disk die. As in, not just mostly dead -- it would not power up. It had a lot of stuff on it that he didn't want to lose, but he felt like the usual suspects who charge $1k to $2k for data recovery were a rip off. So he hung onto the disk in case a cheaper option came along. Then just recently when I saw some people on a local linux group mailing list recommend utah data rescue I suggested to my brother that he give it a try. UTDR starts at "only" $300. UTDR did indeed recover the data, although they charged $100 extra for this one. Mac fee? Tricky hw problem? I don't know. But it was still a lot cheaper than the other companies I googled for fixing a physically dead drive. (As opposed to a corrupt partition table or something where the hardware itself was okay.) At least, the ones that actually give you a price up front rather than hiding behind "reques

Semi-automatic software installation on HP-UX, with dependencies

I had to install subversion on a couple HP-UX boxes. Fortunately, there's an HP-UX software archive out there with precompiled versions of lots of software. Unfortunately, dependency resolution is like the bad old days of 1997: entirely manual. And there's fifteen or so dependencies for subversion. So, I wrote a script to parse the dependencies and download the packages automatically. It requires Python -- which you can install from the archive with just the Python package and the Db package -- and BeautifulSoup, which you can google for. Usage is hpuxinstaller <archive package url> <package name> [e.g., hpuxinstaller subversion] [wait for packages to download] gunzip *.gz [paste in conveniently given swinstall commands] Here is the script: #!/usr/local/bin/python import urlparse, urllib2, sys, os from subprocess import Popen, PIPE from BeautifulSoup import BeautifulSoup required = {

Congratulations, Mozy

I left backup service provider Mozy about three months ago, and yesterday they were acquired by EMC as rumored by techcrunch earlier. The cool thing about startups is they pretty much have to hire people who are totally not qualified to do awesome things and let them try. There's no way Amazon would have hired me to write S3, but that's what I did for Mozy. Mozy was the third startup I've been a part of, and the first to amount to anything. I was employee number #3 and saw it grow from sharing a single rented office to 50 employees in two years. With people who didn't think it was strange to wear a tie to work. Trippy. Unfortunately I'm not there to witness the final stage of being assimilated by the Borg firsthand, but I hear that's not really any more fun than it sounds so perhaps it's just as well. Nice work, guys.

Wing IDE 3, Wing IDE 101 released

Wing IDE version 3 has been released . The list of new features is a little underwhelming. Multi-threaded debugging and the unit testing tool (only supporting unittest -- does anyone still use that old module anymore?) are nice but I don't see myself paying to upgrade from 2.1 yet. Now if they could get the GUI to keep up with my typing in Windows, I'd pay for that... I guess this is a sign that Python IDEs are nearing maturity; Komodo 4 didn't have any earth-shaking new features either, at least as far as Python was concerned. (Personally I think someone should start supporting django/genshi/mako templates already. Maybe in 3.1, guys?) Following ActiveState's lead , Wingware has also released a completely free version, Wing IDE 101. The main difference is that where the most essential feature Komodo Edit leaves out as an incentive to upgrade is debugging, Wing IDE 101 includes the debugger but omits code completion. Wingware also continues to offer the low-c

That wasn't the pigeonhole I expected

I went to the BYU CS alumni dinner tonight. At one point they briefly put everyone's name and position on a projector, one at a time. (At five seconds apiece it wasn't as tedious as it sounds.) When it was my turn, it announced "Jonathan Ellis, System Administrator." What the hell? It turns out that when I RSVP'd I said I was a "python kung-fu master & sysadmin of last resort." (In the sense that, if you really can't find a better sysadmin, I know enough to be dangerous.) Don't bother trying to be clever around bureaucrats.

Utah Open Source Conference 2007

The first Utah Open Source Conference finished today. I heard that they had close to 300 attendees -- not bad at all for a freshman effort. I reprised presentations that I've given before, on SQLAlchemy and distributed source control. My slides are on the presentations page (although if you've seen my slides from either before, there's not much new there -- I got lucky, SA 0.4 isn't stable yet so I stuck with 0.3.10). I had to work Friday so I missed a lot of presentations, but of the one I saw my favorite was on Ganglia , which I hadn't heard of before but which looks quite useful for anyone running a bunch of servers that takes uptime and qos seriously. (This was actually Brad Nicholes's third presentation of the conference -- he must have been busy!) Afterwards I went to the board games BoF and played Mag Blast. Fun little game.

What it means to "know Python"

Since Adam Barr replied to my post on his book , I'd like to elaborate a little on what I said. Adam wrote, [F]or me, "knowing" Python means you understand how slices work, the difference between a list and a tuple, the syntax for defining a dictionary, that indenting thing you do for blocks, and all that. It's not about knowing that there is a sort() function. In Python, reinventing sort and split is like a C programmer starting a project by writing his own malloc. It just isn't something you see very often. Similarly, I just don't think you can credibly argue that a C programmer who doesn't know how to use malloc really knows C. At some level, libraries do matter. On the other hand, I wouldn't claim that you must know all eleventy jillion methods that the Java library exposes in one way or another to say you know Java. What is the middle ground here? I think the answer is something along the lines of, "you have to get enough practi

Merging two subversion repositories

Update: an anonymous commenter pointed out that yes, there is a (much!) better way to do this with svnadmin load --parent-dir, which is covered in the docs under "repository migration." All I can say in my defense is that it wasn't something google thought pertinent. So, for google's benefit: how to merge subversion repositories . Thanks for the pointer, anonymous! I needed to merge team A's svn repository into team B's. I wanted to preserve the history of team A's commits as much as reasonably possible. I would have thought that someone had written a tool to do this, but I couldn't find one, so I wrote this. (Of course, now that I'm posting this, I fully expect someone to point me to a better, pre-existing implementation that I missed.) The approach is to take a working copy of repository B, add a directory for A's code, and for each revision in A's repository, apply that to the working copy and commit it. This would be easy if

A brief reaction to "Find the Bug"

I picked up a copy of Adam Barr's Find the Bug , which is a cool concept for a book. (5 languages, 50 programs, 50 bugs; see if you can spot them.) I found the bug in the first program, in C, then skipped to the Python chapter. The first two programs were not too bad, as pedagogical exercises go (although iterating through substrings instead of a.startswith(b) in the 2nd was painful). The third, though, was "Alphabetize words," 25 sloc to perform the equivalent of def alphabetize(buffer): L = buffer.split(' ') L.sort() return L ... doing everything about the hardest way possible. Now, it's pretty hard to introduce a non-obvious bug into my version of this function, so it wouldn't be appropriate for Mr. Barr's book when written this way. But the right thing to do is to make the task more difficult, not dumb Python down to the level of C! It's very very painful to read Python written like that. (Actually it's painful to read a

Final version of OSCON SQLAlchemy slides Also the code snippets: This is what I'll be using in my tutorial tomorrow. Update: I forgot to "svn up" on my web server. So now the final version is up.

PEP rss feed is live

After I complained that could use a PEP rss feed , David Goodger invited me to volunteer to write one. So I did. (With Martin v. Löwis doing the integration with the site build script. Thanks Martin!) The feed is live at .

Opera 9.2 is a pretty good browser

I've been trying Opera 9.2 for a week, and I'm pleased with it enough that it's going to continue to be my main browser. The main selling points for me are MDI weirdness is mostly hidden now, I hated earlier Opera UIs 20-30% less memory use; even after poking about in the guts of about:config to force FF's memory cache to the same 10MB that I gave Opera (which exposes this option right in the UI), Opera consistently uses less memory for the same workload. (Without adding this option to FF, it would max out around 400MB instead of 150MB.) feels snappier; opera seems quicker to start rendering something useful on slow-loading sites like, although total render time is about the same. It's also instantaneous to open a new tab, which consistently takes around 1s on FF after I've been using it a while. I open and close tabs frequently. UI takes up less space: I know it's possible to re-skin FF, but I'd have to google it to find out how. Opera m

A workaround for the sys.excepthook bug

About two years ago I reported the bug sys.excepthook doesn't work in threads . Then just recently someone asked in #utahpython if I had a workaround. Here it is (also added as a comment to the bug report) -- all we do is monkeypatch to run the excepthook manually if there is an uncaught exception: def install_thread_excepthook(): """ Workaround for sys.excepthook thread bug ( Call once from __main__ before creating any threads. If using psyco, call psyco.cannotcompile( since this replaces a new-style class method. """ import sys, threading run_old = def run(*args, **kwargs): try: run_old(*args, **kwargs) except (KeyboardInterrupt, SystemExit): raise except: sys.excepthook(*sys.exc_info()) threading

How DOS 1.0 cost me an hour of scratching my head

A couple months ago, I migrated my text rpg Carnage Blender to a new server, with Ubuntu 6.06 on the new box. For an unknown reason, ftstrpnm on the new box wouldn't generate the pngs I used in my captchas. It was easier to just check in the images from the old machine into the my svn repository than debug this, so I did. The downside was that my working copy on my Windows laptop stopped being able to update from the repository. It would get to "words/con.png," and error out. Google, for once, didn't turn up anything useful. Today I got motivated. I tried all kinds of ways to get this to work. A new checkout had the same problem on Windows, but on Linux worked fine. The svn command line client for windows didn't work any better than Tortoise -- instead of "Error: Can't open file '...words\.svn\text-base\con.png.svn-base': Access is denied", it barfed con.png to stdout, and died. This was a clue, but I didn't realize that until

It's time for python development to open up a little

I found out from Brett Cannon's blog that an abstract base clase (ABC) PEP has been accepted. I don't like this PEP. It's a very big (and more important, inelegant) change to Python's style. But my real complaint is that as big as this change is, and as much as I try to stay current with Python (subscribing to 30+ blogs) I didn't have a chance to get involved in the discussion until after the PEP was already approved. Python is big enough now that there should be some mechanism for feedback from the community before the priesthood of python-dev writes something in stone. Currently, if you want to know about PEPs before they are approved, you have to subscribe to both python-dev and python-3000 (which isn't linked from either the mailing lists page or the dev page , btw). I really don't care about the vast majority of these lists' traffic but PEPs, at least some of them, are important. If the python-dev summaries ever got updated this might be

Best Python book for beginners

It's really surprisingly difficult for someone who has been programming for a long time to write about programming at a level appropriate for real beginners. The first time I taught a class full of beginners at Neumont, I tried to take things as slow as possible. Then I spent the next week covering the material from the first day even slower. So when the UGIC asked me to recommend a book to get for the participants in the Introduction to Python, I looked at all the ones I could find, but they all either assumed too much existing knowledge or covered material that would just confuse a beginner. Often both. But then Michael Bernstein pointed me to Python for Dummies . If you're looking to teach beginners, or you're a beginner yourself, Python for Dummies is by far the best option. There's a few sections that are strikingly inappropriate for a book at its level (new-style classes!?) but it's still much, much better than any of the other books on the market in

Introduction to Python slides

Here are the slides from my introduction to python at the UGIC conference today. This presentation was meant for people with little to no programming experience. So I deliberately kept it pretty basic, and in fact in 90 minutes we only covered up to about slide 20 in the pdf. I also added an exercise before moving on to slide 10. ("Read 3 integers into a list, and print the sum.") There were 17? people there (which was the room's capacity), so it was very nice to have Kevin Bell also answering questions individually during the exercises.

Mercurial presentation slides

Thursday I presented on distributed source control and Mercurial to the utah python ug. Here are my slides. Then on Friday, Mozilla announced that they're moving from CVS to Mercurial , joining OpenSolaris and Xen and others on hg. It's exciting to see what is still a small and elegant tool gain traction like this, even though in some ways hg (and dscm in general really) is still in the early adopter stage.

Mozy code deathmatch

My employer, the creator of Mozy, is running a programming contest this Saturday. 9 languages are allowed. The first 2 rounds are online; the finals are in American Fork (Utah), but if you make it that far you're guaranteed to win some money. (We did this last year too; this year the prize money is doubled to $20k. Not to mention how we are super-experienced contest organizers now!)

One thing I don't hate about Python

Sure, some things about Python bug me. But that's not what this is about. I wanted to react to Jacob Kaplan-Moss's gripes instead of promulgating my own. Specifically, his problem with Python's interfaces, or lack thereof. I think I can keep this brief: interfaces are a hack that Java uses because Gosling et al thought multiple inheritance was too confusing and/or dangerous. (I believe I've read something recently where Gosling said that this was one decision he might do differently if he were re-designing Java now with the benefit of hindsight, but I can't find the source. Anyone remember seeing that?) Python has MI. It doesn't need interfaces. I'm a little baffled that someone on the django core team would cite this as a problem with Python. Jacob's precise objection is, I shouldn’t need to care care about the difference between something that pretends to be a list and something that really is a list. That's just it! You don't

Introduction to Python at UGIC conference

I'll be giving a (very!) introductory Python workshop at the Utah Geographic Information Council conference in April. After my 90 minutes, Kevin Bell -- also of the utah python user group -- will present on specific GIS applications. (Apparently Python is particularly big in GIS these days because one of the big vendors, ERSI, takes Python pretty seriously .)

PyCon SQLAlchemy tutorial slides

My SQLAlchemy tutorial went pretty well for the most part. It was a fast pace but most people kept up pretty well. If I did it again I would add more of an intro to ORM in general for people who had never used one, but over half the attendees had used SO or django's or tried SA already. I would also paste more code from my slides into the samples download to save people typing during the exercises (I had some, but I would do more next time). I think most people liked it; the main exception was one fellow who was in way way over his head and visibly pissed about it. (I used a list comprehension at one point and he had no idea what it was.) The slides are here. (The .py files referred to in the slides have also been moved to the jellis/ subdirectory.)

Spyce at PyCon

I'll be representing Spyce as a late addition to the Web Frameworks panel. I'm also planning a lightning talk on Ajax in Spyce 2.2 (which will be released as soon as I finish getting the docs in shape) and an open-space Introduction to Spyce. See you there!

SQLAlchemy slides

I presented on SQLAlchemy at the Utah python user group last Thursday; slides are linked here . In retrospect, for a shorter presentation like this I should probably spend more time talking about the ORM features, and less about the SQL layer. Although the SQL layer is useful on its own, and essential for doing advanced mapping, I don't think it has the sex appeal that the ORM has. (Although I do think the first part, about why ORMs should allow you to take advantage of your database's strengths rather than being limited to a MySQL 3 feature set, was useful.)

Komodo 4 released; new free version

ActiveState has released Komodo IDE 4 . Perhaps more interesting, if you're not already a Komodo user, is the release of Komodo Edit , which is very similar to the old Komodo IDE Personal edition, only instead of costing around $30, Komodo Edit is free. The mental difference between "free" and "$30" is much more than the relatively small amount of money; it will be interesting to see what happens in the IDE space now. After a brief evaluation I would say Edit is perhaps the strongest contender for "best free python IDE." The only serious alternative is PyDev, which on its Eclipse foundation provides features like svn integration that Edit doesn't. PyDev also includes a debugger, another feature ActiveState would like to see you upgrade to the full IDE for. But Komodo is stronger in other areas such as call tips and, well, not being based on Eclipse. I also think its code completion is better, although this impression is preliminary. It

Caution: upgrading to new version of blogger may increase spam

I was pretty happy with the old version of blogger, but I upgraded today so I can use the new API against my own blog. So far I have 4 spam comments (captcha is still on) versus about that number for the entire life of my blog under the old blogger. Bleh. Could just be a coincidence. I hope so. (Update Feb 26: A month later, I've had just one more spam comment. So it probably really was just coincidence.)

Abstract of "Advanced PostgreSQL, part 1"

In December, Fujitsu made available a video of Gavin Sherry speaking on Advanced PostgreSQL . (Where's part 2, guys?) Here's some of the topics Gavin addresses, and the approximate point at which they can be found in the video. [start] wal_buffers: "at least 64"; when it's ok to turn fsync off [not very often]; how hard disk rpm limits write-based transaction rate, even with WAL 00:12: wal_sync_method = fdatasync is worth checking out on Linux 00:13: FSM [free space map], MVCC, and vacuum; how to determine appropriate FSM size; why this is important to avoid VACUUM FULL 00:22: vaccum_cost_delay 00:26: background writer 00:30: history of buffer replacement strategies 00:37: scenarios where bgwriter is not useful 00:41: how random_page_cost affects planner's use of indexes 00:47: effective_cache_size 00:49: logging; how to configure syslog to not hose your performance 00:52: linux file system configuration 00:58: solaris fs

Why SQLAlchemy impresses me

One of the reasons ORM tools have a spotted reputation is that it's really, really easy to write a dumb ORM that works fine for simple queries but performs like molasses once you start throwing real data at it. Let me give an example of a situation where, to my knowledge, only SQLAlchemy of the Python (or Ruby) ORMs is really able to handle things elegantly, without gross hacks like "piggy backing." Often you'll see a one-to-many relationship where you're not always interested in all of the -many side. For instance, you might have a users table, each associated with many orders. In SA you'd first define the Table objects, then create a mapper that's responsible for doing The Right Thing when you write "user.orders." (I'm skipping connecting to the database for the sake of brevity, but that's pretty simple. I'm also avoiding specifying columns for the Tables by assuming they're in the database already and telling SA to a

MySQL backend performance

Vadim Tkachenko posted an interesting benchmark of MyISAM vs InnoDB vs Falcon datatypes. (Falcon is the new backend that MySQL started developing after Oracle bought InnoDB.) For me the interesting part is not the part with the alpha code -- Falcon is competitive for some queries but gets absolutely crushed on others -- but how InnoDB is around 30% faster than MyISAM. And these are pure selects, supposedly where MyISAM is best. Of course this is a small benchmark and YMMV, but this is encouraging to me because it suggests that if I ever have to use MySQL, I can use a backend with transactions, real foreign key support, etc., without sucking too badly performance-wise. (It also suggests that people who responded to the post on postgresql crushing mysql in a different benchmark by saying, "well, if they wanted speed they should have used MyISAM," might want to reconsider their advice.)

Fun with three-valued logic

I thought I was pretty used to SQL's three-valued logic by now, but this still caused me a minute of scratching my head: # select count(*) from _t; count ------- 1306 (1 row) # select count(*) from _t2; count ------- 19497 (1 row) Both _t and _t2 are temporary tables of a single column I created with SELECT DISTINCT. # select count(*) from _t where userhash in (select userhash from _t2); count ------- 982 (1 row) # select count(*) from _t where userhash not in (select userhash from _t2); count ------- 0 (1 row) Hmm, 982 + 0 != 1306... Turns out there was a null in _t2; X in {set containing null} evaluates to null, not false, and negating null still gives null. (The rule of thumb is, any operation on null is still null.) ................. I'm giving a tutorial on Advanced Databases with SQLAlchemy at PyCon in February. Feel free to let me know if there is anything you'd like me to cover specifically.

Good advice for Tortoise SVN users

My thinkpad R52's screen died a couple days ago. I decided that this time I was going to be a man and install Linux on my new machine: all our servers run Debian, and "apt-get install" is just so convenient vs manual package installation on Windows. And it looks like qemu is a good enough "poor man's vmware" that I could still test stuff in IE when necessary. Alas, it was not to be. My new laptop is an HP dv9005, and although ubuntu's livecd mode ran fine, when it actually installed itself to the HDD and loaded X it did strange and colorful things to the LCD. Things that didn't resemble an actual desktop. When I told it to start in recovery mode instead it didn't even finish booting. That was all the time I had to screw around, so I reinstalled Windows to start getting work done again. Which brings me (finally!) to this advice on tortoisesvn : it really puts teh snappy back in the tortoise. Thanks annonymous progblogger!