Spyced

Posts

Showing posts from December, 2006

Walt Mossberg: "I prefer Mozy"

I don't blog about my day job at Mozy much, but I can't pass this one up: Walt Mossberg reviewed Mozy vs Carbonite in today's Wall Street Journal, and he concluded "of the two products, I prefer Mozy." "The Walt effect" makes the Digg effect look like small potatoes.

Komodo 4 will not include gui builder out-of-the-box

As of Komodo 4.0 beta 2 , Komodo no longer includes ActiveState's Tk-based GUI builder (with support for non-tcl Tk bindings like Python's Tkinter). They've released it as open source to the SpecTcl project, from which it was apparently forked long ago. Re-integration with Komodo 4 as a plugin is planned eventually, but it doesn't look likely before the 4.0 release. I wonder how much of the impetus behind this is the increased amount of web-based development done today, and how much is due to Tk's increasingly dated look and the increased popularity of more modern toolkits such wxwidgets.

Wow, the gzip module kinda sucks

I needed to scan some pretty massive gzipped text files, so my first try was the obvious "for line in gzip.open(...)." This worked but seemed way slower than expected. So I wrote "pyzcat" as a test and ran it against a file with 100k lines: #!/usr/bin/python import sys, gzip for fname in sys.argv[1:]: for line in gzip.open(fname): print line, Results: $ time zcat testzcat.gz > /dev/null real 0m0.329s $ time ./pyzcat testzcat.gz > /dev/null real 0m3.792s 10x slower -- ouch! Well, if zcat is so much better, let's try using zcat to do the reads for us: def gziplines(fname): from subprocess import Popen, PIPE f = Popen(['zcat', fname], stdout=PIPE) for line in f.stdout: yield line for fname in sys.argv[1:]: for line in gziplines(fname): print line, Results: $ time ./pyzcat2 testzcat.gz |wc real 0m0.750s So, reading from a zcat subprocess is 5x faster than using the gzip module. cGzipFile anyo

SQLAlchemy at Pycon 2007

Mark Ramm will be giving a talk on SQLAlchemy . I'll be giving a talk on SqlSoup , the SQLAlchemy extension I wrote, as well as a tutorial on Advanced Databases with SQLAlchemy. For my tutorial, I'll be targetting people who understand database fundamentals but want to learn about more advanced features like triggers and how an ORM like SQLAlchemy lets you take advantage of those. (Many ORM tools force you to give up the more powerful database features and pretend instead that your database is a dumb object store, which IMO defeats one of the main purposes of using a modern database.) If you need to brush up on fundamentals first, Steve Holden is running a more basic tutorial on databases with Python earlier in the day. Here's his outline ; his slides from pycon 06 on the same subject are also online. If there's something you'd like to see covered in my talk or tutorial, comments are welcome by email (jonathan at utahpython dot org) or right here.

Benchmark: PostgreSQL beats the stuffing out of MySQL

This is interesting, because the conventional wisdom of idiots on slashdot continues to be "use postgresql if you need advanced features, but use mysql if all you care about is speed," despite all the head-to-head benchmarks I've seen by third parties showing PostgreSQL to be faster under load. (MySQL's own benchmarks, of course, tend to favor their own product. Go figure, huh.) Here's the latest, showing postgresql about 50% faster than mysql across a variety of hardware. But where MySQL really takes a pounding is when you add multiple cores / CPUs : MySQL adds 37% performance going from 1 to 4 cores; postgresql adds 226%. Ouch! (This would also explain why MySQL sucks so hard on the Niagra chip on the requests-per-second graph -- Sun sacrificed GHz to get more cores in.) As even low-end servers start to go multicore this is going to be increasingly important. Update: PostgreSQL core member Josh Berkus says : [This] is a validation of the last four y