Wednesday, April 20, 2005

Introduction to Durus

The README that comes with Durus is missing a couple pieces of information that are critical if you actually want to write a program that uses Durus, and the only other documentation appears to be the 2005 Durus pycon presentation, which gives an admirable description of the technical underpinnings but doesn't fill in the blanks of how to use it, either.

Specifically, as far as code samples go, the README gives you this:

Example using FileStorage to open a Connection to a file:

    from durus.file_storage import FileStorage
    from durus.connection import Connection
    connection = Connection(FileStorage("test.durus"))

And this:

    # Assume mymodule defines A as a subclass of Persistent.
    from mymodule import A 
    x = A()
    root = connection.get_root() # connection set as shown above.
    root["sample"] = x           # root is dict-like
    connection.commit()          # Now x is stored.

That's all you get.

Okay, in this situation it's clear how to retrieve x -- look it up by the key we've hardcoded. But what if x is a list of Persistent objects, and I want to retrieve one of those. Do I have to keep track of its index in x? Do I need to generate my own unique IDs? (Note that the linked code has an obvious race condition in a multithreaded environment.)

No, you don't. Durus assigns each Persisent object an attribute called _p_oid. (P object id. Not sure if P stands for Persistent, or Private, or something else entirely.)

    def __new__(klass, *args, **kwargs):
        instance._p_oid = None # <-- this is the oid that get() cares about

    def _p_format_oid(self):
        return format_oid(self._p_oid)

_p_oid is a four-byte binary string, so for passing around as a GET or POST variable in a web application, the format function (which turns it into a string representation of the oid number) is handy.

Now that you're passing oids around, Durus gives you an easy way to retrieve the object it identifies:

    def get(self, oid):
        """(oid:str|int|long) -> Persistent | None
        Return object for `oid`.

        The object may be a ghost.

Note that if you do pass around the formatted id, you'll need to turn it into a Python int (or long) before sending it to get; if you pass the string '123' get will assume it's a valid (binary) oid and not autoconvert it.

Now that the code diving is out of the way, I'm enjoying Durus a lot. Next post I'll give a short Spyce demo using Durus for persistence.


Anonymous said...

Nope, I must have my blinkers on because I don't see it.

After putting 'x' in root how do I get it back out of the dictionary in subsequent sessions? And how do I know, after insert, what it's _p_oid is?

Jonathan Ellis said...

OK, here's a sample that puts it together a little more:

from durus.file_storage import FileStorage
from durus.connection import Connection
from durus.persistent import Persistent
conn = Connection(FileStorage("test.durus"))

## first session ##
class A(Persistent): pass # in real code you'd flesh it out some of course
x = A()
x.someattr = 'foo'
conn.get_root()['sample'] = x
print x._p_oid
print x._p_format_oid()

## second session ##
print conn.get_root()['sample']
# or, say you have a Spyce page that passed the formatted id in a GET var
print conn.get(int(request['id']))

Anonymous said...

I think it's easier to use the dictionary access than mucking around with oids. It will make your life easier, particularly if you encapsulate your durus-touching code in functions/methods that abstract away the durus-specific stuff.

One other note: in your "first session" you connect to your durus storage using FileSession. If you want to use durus in a multi-threaded or multi-process setting (like most webservers), you'll want to use ClientStorage. And, right before you get anything, you'll want to call conn.abort() in order to synchronize your durus cache. Otherwise, you may well trigger an exception.

Of course, using ClientStorage means starting up a durus server. You can do it programmatically, using durus' own internal API, but I find it easier just to do os.system('durus -s --file test.durus')

---Peter Herndon

Jonathan Ellis said...

Interesting, an API to autostart the server is another thing the docs didn't mention -- I deliberately used FileStorage so I didn't have to spend code checking to see if the server were already running, etc.

Jonathan Ellis said...

I'm guessing StorageServer.serve is what you're referring to, and no, it's not really an improvement over os.system-ing it. I'll stick to FileStorage for examples.

Neil Schemenauer said...

Using _p_oid as application object identifiers is not really recommended. They are basically a DB implementation detail. If you want an ID, give your object one (e.g. an 'id' attribute). Normally you would store things in dicts, PersistentDicts or BTrees for easy retrieval.

Jonathan Ellis said...

Using Durus at all is an implementation detail; it's not like there is anything else out there that is API-compatible with it, except maybe ZODB. As long as I'm doing that, I might as well make use of what it provides, IMO.

David Binger said...

One thing to note: the _p_oids are not assigned to objects until they are saved with a transaction. This probably does not matter in an application, but I think it is good to know.

If you ever wanted to move part of a durus database into a different durus database, the _p_oids would change--just something else to be aware of.

Don't use a Durus Connection from multiple threads.

Durus-2.0 will be released later this week. The only change is that the new license will be gpl-compatible.

Jonathan Ellis said...

Thanks for pointing that out.

Mario Ruggier said...

About not using a Durus Connection from multiple threads, would this imply that:

a) either you'd need to create a Connection per thread (so you have as many memory caches as threads) ?

b) or you'd need to proxy wrap a Connection to make it thread-safe, for example such as in this ThreadedProxy wrapper recipe: ?