Wednesday, November 30, 2005

A point for ORM developers to remember

If you are writing an ORM tool, please keep this one point in mind:

I already know SQL.

Your new query syntax isn't better; it's just different. Which means one more thing for me to learn. Which means I probably won't bother, and will use instead an ORM tool that respects my time.

Consider: if your target audience is like me, and already knows SQL, this should go without saying. Resist the temptation to be "clever" and overcomplicate things.

If your target audience does not know SQL, then either

  • they intend to learn SQL, because they're using a relational database and expect that non-python code or ad-hoc queries will be necessary at some point, or
  • they have no intention of learning SQL, and all their data-access code will now and forever be in python, in which case they would be better off using Durus or ZODB or another OODB. Be honest with these people and everyone will be happier.

Django does some cool stuff, but heaven help me if I ever had to use an abomination like their ORM in its present form. It's not pretty. Look, here's a rule of thumb to tell when your code sucks: if it reminds the reader of perl, it sucks. There. The secret of not sucking is yours. Repent and suck no more!

At the risk of proceeding to beat a dead horse, didn't anyone look at those code samples and think, "wow, our ORM code is way the hell uglier than the vanilla SQL?"

But even if you discover the World's Prettiest And Most Functional Syntax, resist the temptation to make me use it. Remember: I already know SQL. (Corollary: don't bother giving SQL functionality a facelift, as in "startswith='WHO'" instead of "LIKE 'WHO%'.")

I only single out Django here because they blogged about how cool their ORM syntax for ORs was, which made me have a look, which prompted this rant. Sorry, Django fans!

(Java is clunky and ugly but the Java SimpleORM tool is better thought-out than anything I have seen in Python, and does not make this mistake. Read the author's whitepaper.)

Tuesday, November 29, 2005

PATA really, really sucks

Trying to figure out why sometimes disk access on my test machine takes way, way too long -- 1000+ ms -- I wrote some test code. My threads ran a function that looks like this:

write = []
def writer():
    while True:
        start = time.time()
        f = tempfile.TemporaryFile()
        f.write('a' * 4000)
        end = time.time()
        write.append(end - start)

Compare the times for max(write) on a machine with a SATA disk and on one with parallel ATA, where the given number of threads are run for a 10 second period:

threads         pata    sata
1               6ms      6ms
2               400ms   11ms   
4               1300ms  24ms

Ouch.

I admit I'm not a hardware nerd. Quite possibly I'm missing something, because even PATA shouldn't be THAT bad. Right? hdparm -i says for the PATA disk:

/dev/hda:

 Model=ST380011A, FwRev=8.01, SerialNo=4JV59KZT
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:

 * signifies the current active mode