Monday, April 25, 2005

how well do you know python, part 5

I was reminded of this one while working on the Spyce/Durus demo to make it a better example of using Durus in a real application.

What is wrong with the following code?

A.py:

import time, threading

finished = False

def foo():
    import sys
    sys.stderr.write('testing')
    global finished
    finished = True

threading.Thread(target=foo).start()
while not finished:
    time.sleep(1)

B.py:

import A

print 'finished'

Update: fixed A.py so there was only one problem.

12 comments:

Ryan Tomayko said...

The most obvious thing wrong is that finished isn't declared global in A.foo() so I think the thread should run and complete but the global finished variable will never be True.

If that's it, you've done a good job of hiding a really obvious problem by introducing threading into the mix. It took me a while to see it because I was expecting something to be wrong with the threading logic.

Which is interesting. I've always sworn threading introduces a whole slew of new problems and should be avoided if at all possible but I never considered that it might also have the effect of hiding well-known problems. :)

Nice.

Jonathan Ellis said...

Actually, that wasn't the problem I intended -- oops! Fixed in the post now. (That's a hint for you -- the problematic behavior that remains gives the same symptom as leaving off the global. :)

Adam Collard said...

I/O blocking... one writes to STDERR, the other to STDOUT (via print)? The correct way to do it would be to use a threading.Event object

Anonymous said...

Let me simplify what others are saying: "It uses threads."

Adam Collard said...

Hmmm. Well I'm wrong. At least what I thought would work doesn't.

Did I mention how much I dislike threading? ;)

mark said...

I presume the issue is that import sys blocks because of the GIL.

What actually *is* so bad about threading? And what do you use as a cross platform alternative?

Jonathan Ellis said...

Yes, the import blocks. It's actually a separate import lock though, not the GIL. (Unless you redefined GIL to stand for Global Import Lock, but that's not what most people mean. :)

Ian Bicking said...

I'm going to guess that lock has something to do with the threadsafety of imports (which are threadsafe), which means other threads can't access the module until it has finished loading... including perhaps the thread started from the module itself...?

Jonathan Ellis said...

Think of it as an RLock under the import hood: if thread One is doing an import, all other import statements will block until One is done before continuing.

So here, the deadlock is that the main thread has the import lock ("import A"), but since it's waiting for _t to set finished, it will never release the lock and _t is forever stuck at "import sys."

Anonymous said...

The real lesson here is: don't start threads as a side effect of import.

Tim Lesher said...

Nice.

John Prevost said...

I’m currently wrestling with this same issue, and not sure the right way to handle it.

I have a system that automatically starts threads to do background computations. The idea is similar to futures (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/84317).

The trouble is that the various functions that are creating these futures might very well be used na├»vely in a module of shared utility routines. It’s unreasonable for me to say to the users of the system “Oh, and by the way, you can’t do this.”

Anyway, is there any way to work around this? My own code does not import within functions, but a number of standard library calls *do*. (For example, os.execvp.)