Thursday, December 14, 2006

Walt Mossberg: "I prefer Mozy"

I don't blog about my day job at Mozy much, but I can't pass this one up: Walt Mossberg reviewed Mozy vs Carbonite in today's Wall Street Journal, and he concluded "of the two products, I prefer Mozy."

"The Walt effect" makes the Digg effect look like small potatoes.

Komodo 4 will not include gui builder out-of-the-box

As of Komodo 4.0 beta 2, Komodo no longer includes ActiveState's Tk-based GUI builder (with support for non-tcl Tk bindings like Python's Tkinter). They've released it as open source to the SpecTcl project, from which it was apparently forked long ago. Re-integration with Komodo 4 as a plugin is planned eventually, but it doesn't look likely before the 4.0 release.

I wonder how much of the impetus behind this is the increased amount of web-based development done today, and how much is due to Tk's increasingly dated look and the increased popularity of more modern toolkits such wxwidgets.

Wednesday, December 06, 2006

Wow, the gzip module kinda sucks

I needed to scan some pretty massive gzipped text files, so my first try was the obvious "for line in gzip.open(...)." This worked but seemed way slower than expected. So I wrote "pyzcat" as a test and ran it against a file with 100k lines:

#!/usr/bin/python

import sys, gzip

for fname in sys.argv[1:]:
  for line in gzip.open(fname):
      print line,

Results:

$ time zcat testzcat.gz > /dev/null
real    0m0.329s

$ time ./pyzcat testzcat.gz > /dev/null
real    0m3.792s

10x slower -- ouch! Well, if zcat is so much better, let's try using zcat to do the reads for us:

def gziplines(fname):
  from subprocess import Popen, PIPE
  f = Popen(['zcat', fname], stdout=PIPE)
  for line in f.stdout:
      yield line

for fname in sys.argv[1:]:
  for line in gziplines(fname):
      print line,

Results:

$ time ./pyzcat2 testzcat.gz |wc
real    0m0.750s

So, reading from a zcat subprocess is 5x faster than using the gzip module. cGzipFile anyone?

Monday, December 04, 2006

pysqlite design decisions

There's an interesting thread over on the pysqlite mailing list about pysqlite, shortcomings of the DBAPI 2, ASPW (an sqlite interface that does NOT attempt to conform to the DBAPI), and a working sqlite interface in 200 lines of python using ctypes. Worth checking out if you're interested in this sort of thing. (Alas, pipermail does a suckariffic job of threading the conversation, so browsing by date is probably your best bet if you haven't already subscribed to the list from a gmail account.)

Friday, December 01, 2006

SQLAlchemy at Pycon 2007

Mark Ramm will be giving a talk on SQLAlchemy. I'll be giving a talk on SqlSoup, the SQLAlchemy extension I wrote, as well as a tutorial on Advanced Databases with SQLAlchemy.

For my tutorial, I'll be targetting people who understand database fundamentals but want to learn about more advanced features like triggers and how an ORM like SQLAlchemy lets you take advantage of those. (Many ORM tools force you to give up the more powerful database features and pretend instead that your database is a dumb object store, which IMO defeats one of the main purposes of using a modern database.)

If you need to brush up on fundamentals first, Steve Holden is running a more basic tutorial on databases with Python earlier in the day. Here's his outline; his slides from pycon 06 on the same subject are also online.

If there's something you'd like to see covered in my talk or tutorial, comments are welcome by email (jonathan at utahpython dot org) or right here.

Benchmark: PostgreSQL beats the stuffing out of MySQL

This is interesting, because the conventional wisdom of idiots on slashdot continues to be "use postgresql if you need advanced features, but use mysql if all you care about is speed," despite all the head-to-head benchmarks I've seen by third parties showing PostgreSQL to be faster under load. (MySQL's own benchmarks, of course, tend to favor their own product. Go figure, huh.)

Here's the latest, showing postgresql about 50% faster than mysql across a variety of hardware. But where MySQL really takes a pounding is when you add multiple cores / CPUs: MySQL adds 37% performance going from 1 to 4 cores; postgresql adds 226%. Ouch! (This would also explain why MySQL sucks so hard on the Niagra chip on the requests-per-second graph -- Sun sacrificed GHz to get more cores in.)

As even low-end servers start to go multicore this is going to be increasingly important.

Update: PostgreSQL core member Josh Berkus says :

[This] is a validation of the last four years of PostgreSQL performance engineering. It's not done yet ... if the Tweakers.net test had included 16+ core machines you'd have seen PostgreSQL topping out ... but our hackers have done quite well.