Thursday, June 30, 2005

Of all the things I've lost...

I installed a sys.excepthook in my project at work that uses the logging module to record any uncaught exceptions. It didn't work.

I did what any lazy programmer would do: ask someone more experienced. "Do you have to set sys.excepthook in each thread?" I asked, more-or-less. "I don't think so," he replied.

Hmm. Maybe the developer I'd taken over from was setting excepthook later on in the initialization or somewhere else. ... Nope.

Finally I wrote this:

import sys, threading

def log_exception(*args):
    print 'got exception %s' % (args,)
sys.excepthook = log_exception

def foo():
    a = 1 / 0
threading.Thread(target=foo).start()

Playing with this a bit demonstrates that sys.excepthook doesn't work in subthreads, at all. The documentation doesn't mention anything of the sort. Smells like a bug to me. (I did file one.)

I belatedly googled "sys.excepthook threads." (I must have been a bit slow this morning to not do this first.) I was more than a little surprised to see my name in the first result that came back. (And a little disappointed that eight months later there's still no replies. :)

I've been a father for a couple years now, so I've gotten used to my memory being a little fuzzy now and then. But usually it's just, well, a bit fuzzy. I can tell that there are details I don't remember immediately, and they usually swim back if I think about it hard enough. But I don't remember posting that at all, nor can I recall what I was working on to prompt it. Funny. But also a little scary.

The tale of a wiki diff implementation

I run a web game that I started in late 2000, back in the Dark Ages before there was a decent python web toolkit. It runs on a then-current version of the OpenACS TCL-based toolkit. (at around 20kloc, porting it to a more modern system wouldn't be worth the effort now.)

Recently, a player suggested that I add a wiki, since my own documentation is chronically out of date and player-run sites tend to suffer bitrot as well. (How many games are you still playing that you started 5 years ago?) So, I backported a modern OpenACS wiki module -- no trivial task; a LOT has changed in OpenACS, and not all for the better -- and was set. Except the module I backported didn't have diff functionality, probably because the various TCL options mostly suck.

Enter TclPython, a tcl module by Jean-Luc Fontaine (who is obviously a far better C hacker than I) that embeds a python interpreter. Sweet! My life just got a lot easier:

package require tclpython
set py [python::interp new]

# $a and $b are the text of the two revisions -- inject them into the python interpreter
$py exec "
from difflib import HtmlDiff
a = \"\"\"$a\"\"\".split('\\n')
b = \"\"\"$b\"\"\".split('\\n')
hd = HtmlDiff(wrapcolumn=80)
"
ns_return 200 text/html [$py eval "hd.make_file(b, a, context=True)"]

The finished result used make_table and included some css to make things pretty, but that's all there is to it fundamentally. Thanks, Mr. Fontaine!

Wednesday, June 15, 2005

SF.net examples fixed, again

This time I apparently broke it by copying the development spyceconf.py, which among other things references the authentication tag that doesn't exist in 2.0, into the sourceforge site. Oops. (The 2.0.2 release itself wasn't affected.)

Monday, June 13, 2005

I figured out why Python's threading library bugs me

Reading Aahz's 2001 OSCon presentation, I ran into a slide that crystalized it [paraphrased]:

  • Perl: There's more than one way to do it
  • Python: There should be one (preferably only one) obvious way to do it
  • Python's threading library is philosophically perl-ish

That pretty much says it all. Well, that and the main classes are (still) virtually undocumented.

Update: I'm referring to the synchronization classes in this module, not the Thread class, which is straightforward enough.

Saturday, June 11, 2005

Why PHP sucks

(July 8 2005)

Apparently I got linked by some PHP sites, and while there were a few well-reasoned comments here I mostly just got people who only knew PHP reacting like I told them their firstborn was ugly. These people tended to give variants on one or more themes:

  • All environments have warts, so PHP is no worse than anything else in this respect
  • I can work around PHP's problems, ergo they are not really problems
  • You aren't experienced enough in PHP to judge it yet

As to the first, it is true that PHP is not alone in having warts. However, the lack of qualitative difference does not mean that the quantitative difference is insignificant.

Similarly, problems can be worked around, but languages/environments designed by people with more foresight and, to put it bluntly, clue, simply don't make the kind of really boneheaded architecture mistakes that you can't help but run into on a daily baisis in PHP.

Finally, as I noted in my original introduction, with PHP, familiarity breeds contempt. You don't need years of experience with PHP before an urge to get the hell out and into a more productive environment becomes almost overwhelming -- provided that you have enough experience to realize what that nagging lack of productivity is telling you when you go to consult the documentation for the fifth time in one morning.

Basically these all boil down to, "I don't have enough experience to recognize PHP's flaws because I haven't used anything better." Many years ago, I had this same attitude about Turbo Pascal: it was the only language I knew at the time, so anyone who pointed out its flaws was in for a heated argument.

There's nothing wrong with being inexperienced, as long as you have an open mind. If you do, try Spyce. Try RoR, if you must. Better toolkits are out there, for those who aren't satisfied with mediocrity.

I've done a fair bit of web development. I've written HTML-generating code in C CGI scripts, Cold Fusion, TCL (with OpenACS), ASP.NET, and, of course, Python with Spyce.

Two technologies I've steered clear of are any J2EE stack and PHP. I've seen enough of each to know that I'd be immensely frustrated with either. Briefly, although they are quite different, neither is elegant, and elegance counts.

Recently, though, I had to spend a few days extending a small amount of PHP code. (I'm just glad for two of those qualifiers: "few" and "small.") The more I used it, the less impressed I was, which is why I don't think a longer experience would make this post any more favorable to PHP.

This is far from an exhaustive list. It may have some reasoning in common with other voices of reason, but I'm writing this primarily from a Python developer's perspective. So, I won't waste time bashing PHP for being dynamic, which some Java zealots do; nor will I fault it for mixing code and markup, which also has its uses (and Spyce recognizes this). PHP's problems are much, much deeper.


First, let's try to be a bit more specific. Does PHP the language suck, or does PHP the environment suck?

They both suck.

In fact, they suck for the same reason: PHP-the-language and PHP-the-environment both grew by accretion of random features, not by any purposeful design for orthogonality. So you have idiot "features" like magic quotes ("Assuming it to be on, or off, affects portability. Use get_magic_quotes_gpc() to check for this, and code accordingly) and register globals (same disclaimer applies, only more so).

Trying to write "portable" php code is such a disaster that it's no wonder almost nobody tries; you can't even code for a least-common-denominator version because (a) so much is subject to change on the whim of the site's config file and (b) even if it weren't, the PHP designers change the defaults almost as often, even within minor version releases. (E.g., the registerglobals change for 4.2.)

The PHP community realizes this to a degree, even if the fanboys won't admit it. PHP5 uptake is almost as glacial as MySQL4 was/is because it breaks so much code. One of the biggest ways is, "all your copy-on-assign code, isn't anymore."

That bears explaining if you haven't coded in PHP4: any time you make an assignment in PHP4, what other languages would call a "deep copy" is performed. So you might naiively expect this code to add a new key/value pair to the associative array at $a[0]:

<?
$a = array(array('foo' => 'bar'));
foreach ($a as $item) {
   $item['new key'] = 'new value';
}
print_r($a);
?>

This, of course, changes $a not at all, because the assignment to $item is a deep copy. (It's also slow as hell if the items you're deep-copying are substantial.) The workarounds for this are truly ugly.

This changes completely in PHP5, which is a good thing, unless you're trying to upgrade an existing body of code. That's not a small "unless;" if you weren't using PHP4, there's really no excuse to start out in PHP5 instead of Spyce or CherryPy or Rails.

Even with a willingness to break backwards compatibility, a lot of broken behavior persists in PHP5. For instance, since I brought up PHP arrays: an array in PHP is really a map. When you're using it as a linear array, it's really mapping keys 0, 1, etc. to your values. (I don't want to know what it does when you're using it as a 2D array.) This might seem like a good idea, until you spend about two seconds thinking about it, at which point those of you who have had a basic introduction to data structures will be thinking, "What the hell? The performance will SUCK!" And you are entirely correct. The only other language I can think of that does this is AWK, and it was a bad idea there, too, but less of a problem since, well, when was the last time you wrote an AWK script longer than 10 lines?

PHP-the-language also shares with Perl the unfortunate tendency to guess what the programmer "really meant" instead of raising errors. (String where an integer makes more sense? No problem, we'll just throw in zero!) Unlike perl, there's no "use strict" option to mitigate this.

I could keep going, on how, for instance, the PHP environment doesn't give you any way to have a single, multithreaded long-running process, which is probably why you see so much PHP code that sticks everything into the session object. Or how PHP's string processing library adopted the C stdlib functions without improving on them. Or a dozen things, but a comprehensive enumeration of PHP design flaws would be a daunting task indeed.

In short, PHP sucks because, PHP's authors are prone to confuse "pragmatism" (a fine design goal, if done well) with "adding random features without considering how they impact the language as a whole." Thus, its authors have found it necessary to correct obvious flaws in both minor and major releases, with the result that the recent PHP5 breaks with the past to an unprecedented degree while still leaving many fundamental flaws un-addressed. I don't know if this is because they didn't recognize those flaws, or more likely, because they were willing to impose "requires a lot of pain to upgrade" but not "requires a complete re-write."

Ian Bicking wrote that "Python could have been PHP" (in popularity for web development). If I were a PHP developer, I'd be pretty disenchanted with how the language has evolved; I don't think it's impossible that Python could have a second chance.

Tuesday, June 07, 2005

Anders Heljsberg doesn't grok Python

While Debian was releasing Sarge and Steve Jobs was introducing MacX86 yesterday, Anders Hejlsberg spoke on C# at Microsoft Tech-Ed. James Avery writes:

After a couple questions from other people I was able to get in the other question I was dying to ask. What does Anders think about the resurgence in dynamic typing from languages like Python. Basically he said that he understands what benefits people are getting from dynamic typing, but he thinks they can get the benefits of dynamic typing without sacrificing strong typing. He talked about inferring type (what anonymous methods do now with delegates) and how that might be a way to get the coding speed and ease without sacrificing the strongly typed information.

Unfortunately, omitting type declarations is only a small part of Python-esque dynamism. One of the smallest, in fact. Far more important are the ability to modify objects and classes at runtime, which allow you to do things in Python that would require code generation (which is fragile, at best) or AOP language modifications to do in C# or Java.

Equally important is the attitude this fosters in the Python community: "Python assumes we're all consenting adults." "Private" restrictions are discouraged (and can still be bypassed by others if they're willing to do less work than performing ordinary refleciton requires in other languages.) As Sion Arrowsmith recently put it,:

Years of writing and maintaining others' C++ and Java code (plus one year of maintaining Python code and rather more writing) has led me to believe that there is no justification for truly private variables... [D]enying derived classes full access to your inner workings just leads to clumsier (less readable and more bug-prone) implementations derivations... [I]t's based on the hubris that you are a better programmer than anyone who might want to extend your class and can forsee all circumstances in which it might be subclassed.

This attitude of "we know better than you what code you will want to write" is found in the Java world, but it's really pervasive at Microsoft. C# methods are "final" by default; no polymorphism for you unless the original author was generous enough to specify "virtual!" This can be done in Java, but at least it's not the default. Also, huge parts of the runtime (in both .NET and Java, but more so in .NET) are "sealed," preventing subclassing entirely.

You couldn't inflict that on your users in a language like Python. Even if you could "seal" a Python class, which will never happen since it's a horrible misfeature, a user could still create his own class with the right attributes and nobody would know the difference (because of duck typing, another important part of the we're-adults-here philosophy). Well, unless you littered your code with isinstance calls, but nobody would use code THAT poorly written... if you want to play in Microsoft's world, you don't have a choice.

Wednesday, June 01, 2005

Spyce 2.0.2 released

Second bugfix release. Get it here.

Changelog:

    session_dir uses config.tmp by default if no directory is specified
    fix for session_dir pickling on win32
    fix for fileCache pickling bug on win32 reported by Jaros³aw Zabiello
    fix for sessions + handlers problem reported by Jonathan Taylor
      * all module init() methods are now run before handlers are called.