Friday, April 14, 2006

PyGame at the utah python user group

I presented on PyGame at the utah python user group last night. (When I don't get someone else lined up to speak in advance, I end up doing it myself. You'd think that would be enough motivation to not procrastinate.)

I had a lot of fun preparing this. I'd never used PyGame before, but as a teenager I spent a lot of time in the same space. (Anyone remember YakIcons?) So the general concepts were familiar to me, and I was pleasantly surprised by how good a job PyGame did at making things easy for me.

Here are my pygame slides, and my pyquest game skeleton is here.

(PyQuest is of course inspired by Crystal Quest -- the mac game, not the XB 360 remake -- and the graphics and sound files are from Solar Wolf, which I guess makes PyQuest LGPL. This caused Paul Cannon some serious mental trauma at the meeting, seeing and hearing solar-wolf-and-yet-not-solar-wolf.)

Monday, April 10, 2006

Introducing SqlSoup

[Update Oct 2006: here is another serving of SqlSoup. Update 2: SqlSoup documentation is now part of the SQLAlchemy wiki.]
Ian Bicking wrote in Towards PHP that a successful Python PHP-killer (as Spyce aspires to be) will need to include a simple data-access tool that Just Works.
I had two thoughts:
  1. He's absolutely right
  2. I could do this with SqlAlchemy in an afternoon
My afternoons are in short supply these days, and it took two of them, counting the documentation. But it's live now, as sqlalchemy.ext.sqlsoup. (The 0.1.5 release includes a docless version of sqlsoup; I recommend the subversion code until Mike rolls a new official release.)
SqlSoup is NOT an ORM per se, although naturally it plays nicely with SqlAlchemy. SqlSoup inspects your database and reflects its contents with no other work required; in particular, no model definitions are necessary.
Here's what SqlSoup looks like, given pyweboff-ish tables of users, books, and loans (SQL to set up this example is included in the test code, but I won't repeat it here):
>>> from sqlalchemy.ext.sqlsoup import SqlSoup
>>> db = SqlSoup('sqlite:///:memory:')

>>> users = db.users.select()
>>> users.sort()
>>> users
[Class_Users(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1),
Class_Users(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]
Of course, letting the database do the sort is better (".c" is short for ".columns"):
>>> db.users.select(order_by=[db.users.c.name])
Field access is intuitive:
>>> users[0].email
u'basepair@example.edu'
Of course, you don't want to load all users very often. Let's add a WHERE clause. Let's also switch the order_by to DESC while we're at it.
>>> from sqlalchemy import or_, and_, desc
>>> where = or_(db.users.c.name=='Bhargan Basepair', db.users.c.email=='student@example.edu')
>>> db.users.select(where, order_by=[desc(db.users.c.name)])
[MappedUsers(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0), MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1)]
You can also use the select...by methods if you're querying on a single column. This allows using keyword arguments as column names:
>>> db.users.selectone_by(name='Bhargan Basepair')
MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1)
All the SqlAlchemy mapper select variants (select, select_by, selectone, selectone_by, selectfirst, selectfirst_by) are available. See the SqlAlchemy documentation (sql construction and data mapping) for details.
Modifying objects is also intuitive:
>>> user = _
>>> user.email = 'basepair+nospam@example.edu'
>>> db.flush()
(SqlSoup leverages the sophisticated SqlAlchemy unit-of-work code, so multiple updates to a single object will be turned into a single UPDATE statement when you flush.)
To finish covering the basics, let's insert a new loan, then delete it:
>>> book_id = db.books.selectfirst(db.books.c.title=='Regional Variation in Moss').id
>>> db.loans.insert(book_id=book_id, user_name=user.name)
MappedLoans(book_id=2,user_name='Bhargan Basepair',loan_date=None)
>>> db.flush()

>>> loan = db.loans.selectone_by(book_id=2, user_name='Bhargan Basepair')
>>> db.delete(loan)
>>> db.flush()






Saturday, April 08, 2006

Database Replication

I spent some time yesterday researching (free) database replication options. Judging from the newsgroup posts I saw, there's a lot of confusion out there. The most common use case appears to be failover, i.e., you want to minimize downtime in the face of software or hardware failure by replicating your data across multiple machines. But, the most commonly-used options are completely inappropriate for this purpose.

As Josh Berkus explained, there are two "dimensions" to replication: synchronous vs async, and master/slave vs multimaster.

For a failover solution, if you want database B to take over from database A in case of failure, with no data loss, only synchronous solutions make sense. By definition, asynchronous replication means that database A can commit a transaction before those changes are also committed to database B. If A happens to fail between commit and replication, you've lost data. If that's not acceptable for you, then neither is async replication.

Be aware that the most popular replication solutions for both PostgreSQL and MySQL are asynchronous.

  • In part because of the contributions by the likes of Fujitsu and Affilas (.org and .info registrar), Slony-I is the most high-profile replication solution for PostgreSQL. Slony-I provides only asynchronous replication.
  • MySQL replication is also asynchronous.

So what are the options for synchronous replication?

  • MySQL "clustering" appears to allow for synchronous replication, but requires use of the separate NDB storage engine, which has a long list of limitations vs MyISAM or InnoDB. (No foreign key support, no triggers, basically none of the features MySQL has been adding for the past few years. Oh, and you need enough RAM to hold your entire database twice over.)
  • PgCluster for postgresql seems fairly mature, but the 1.3 (8.0-based) and 1.5 (8.1-based) versions still aren't out of beta. PgCluster also patches the postgresql source directly, which makes me a little nervous.
  • Another option is something like pgpool, which multiplexes updates across multiple databases. The biggest limitation of this approach is that you're on your own for recovery, i.e., after A goes down and you switch to B alone, how do you get A back in sync? A fairly common approach is to combine pgpool with Slony-I async replication for recovery.

The bottom line is, high availability isn't as simple as adding whatever "replication" solution you first run across. You need to understand what the different kinds of replication are, and which are appropriate to your specific situation.