Skip to main content

How well do you know python, part 8

Here is a mostly-functioning class that facilitates writing producer / consumer algorithms. It is designed to support a single producing and multiple consuming threads, i.e., scenarios where consuming involves some blocking operations such as communicating over a socket.

"Mostly," in this case, means that there are two bugs in this class that can be fixed in a total of four or five lines. One is arguably more subtle than the other, but they involve the same part of the code. Can you spot them?

Warning: this one is tougher than part 2, which also dealt with threading.

import Queue as queue
import threading

class PCQueue:
    def __init__(self, initialthreads):
        self.q = queue.Queue()
        self.running = True
        self.condition = threading.Condition()
        for i in range(initialthreads):
            self.add_consumer()
    def run_consumer(self):
        while True:
            self.condition.acquire()
            try:
                self.condition.wait()
                if not self.running:
                    return
            finally:
                self.condition.release()
            obj = self.q.get()
            self.consume(obj)
    def put(self, obj):
        self.condition.acquire()
        self.q.put(obj)
        self.condition.notify()
        self.condition.release()
    def consume(self, obj):
        raise 'must implement consume method'
    def add_consumer(self):
        threading.Thread(target=self.run_consumer).start()
    def stop(self):
        self.running = False
        self.condition.acquire()
        self.condition.notifyAll()
        self.condition.release()

Here's a short example using PCQueue.

class PCExample(PCQueue):
    def consume(self, url):
        import urllib
        n = len(urllib.urlopen(url).read())
        print '\t%s contains %d bytes' % (url, n)

q = PCExample(2) # 2 threads
while True:
    print 'url? (empty string to quit)'
    url = raw_input()
    if not url:
        break
    q.put(url)
q.stop()

Comments

the paul said…
Well, I haven't run this, so I don't know if there's something even more obvious that I'm missing, but..

(a) It looks like it's possible for the put() method to notify the threads at a moment when none of them are actually waiting. The item would get stuck in the queue, probably forever. Trying to call self.q.get_nowait() in run_consumer before the wait would probably fix this.

(b) The notifyAll() call in stop could also happen when some of the threads aren't waiting, so they would wait for another notification before checking self.running. Similarly to the last item, putting another check for self.running before the wait would solve this.

(b) the use of the condition.notify() in put() isn't quite future-proof; the docs say future versions of notify might sometimes wake up more than one thread, and the code doesn't allow for that. This is probably not an issue.

I have a feeling I didn't catch the subtle bug.
the paul said…
Er, on (a), I meant that some item would always be stuck in the queue, not that that specific item would remain stuck.
Anonymous said…
I don't know Python this well, but your challenges are interesting.

I tried the program and it seems to run well, but anyway :

In run_consumer there is:
self.condition.wait()
if not self.running:
return
shouldn't the condition be released before the return?
Jonathan Ellis said…
Paul, those are the issues I had in mind.

Damjan: that is why that code is in a try/finally block. The code in a finally block always gets executed after the try block completes, even if the try block terminated because of a break or return.
Anonymous said…
jonathan: on the first gleam I saw the finally: statement as an except: .

Then I added a print "Condition released" just after the finally: statement to make sure the condition is released.
the paul said…
This one was easier to me than the part 2 question, for which I had to decode Ian Bicking's answer. I suck!

Popular posts from this blog

Why schema definition belongs in the database

Earlier, I wrote about how ORM developers shouldn't try to re-invent SQL . It doesn't need to be done, and you're not likely to end up with an actual improvement. SQL may be designed by committee, but it's also been refined from thousands if not millions of man-years of database experience. The same applies to DDL. (Data Definition Langage -- the part of the SQL standard that deals with CREATE and ALTER.) Unfortunately, a number of Python ORMs are trying to replace DDL with a homegrown Python API. This is a Bad Thing. There are at least four reasons why: Standards compliance Completeness Maintainability Beauty Standards compliance SQL DDL is a standard. That means if you want something more sophisticated than Emacs, you can choose any of half a dozen modeling tools like ERwin or ER/Studio to generate and edit your DDL. The Python data definition APIs, by contrast, aren't even compatibile with other Python tools. You can't take a table definition

Python at Mozy.com

At my day job, I write code for a company called Berkeley Data Systems. (They found me through this blog, actually. It's been a good place to work.) Our first product is free online backup at mozy.com . Our second beta release was yesterday; the obvious problems have been fixed, so I feel reasonably good about blogging about it. Our back end, which is the most algorithmically complex part -- as opposed to fighting-Microsoft-APIs complex, as we have to in our desktop client -- is 90% in python with one C extension for speed. We (well, they, since I wasn't at the company at that point) initially chose Python for speed of development, and it's definitely fulfilled that expectation. (It's also lived up to its reputation for readability, in that the Python code has had 3 different developers -- in serial -- with very quick ramp-ups in each case. Python's succinctness and and one-obvious-way-to-do-it philosophy played a big part in this.) If you try it out, pleas

A review of 6 Python IDEs

(March 2006: you may also be interested the updated review I did for PyCon -- http://spyced.blogspot.com/2006/02/pycon-python-ide-review.html .) For September's meeting, the Utah Python User Group hosted an IDE shootout. 5 presenters reviewed 6 IDEs: PyDev 0.9.8.1 Eric3 3.7.1 Boa Constructor 0.4.4 BlackAdder 1.1 Komodo 3.1 Wing IDE 2.0.3 (The windows version was tested for all but Eric3, which was tested on Linux. Eric3 is based on Qt, which basically means you can't run it on Windows unless you've shelled out $$$ for a commerical Qt license, since there is no GPL version of Qt for Windows. Yes, there's Qt Free , but that's not exactly production-ready software.) Perhaps the most notable IDEs not included are SPE and DrPython. Alas, nobody had time to review these, but if you're looking for a free IDE perhaps you should include these in your search, because PyDev was the only one of the 3 free ones that we'd consider using. And if you aren