One of the reasons ORM tools have a spotted reputation is that it's really, really easy to write a dumb ORM that works fine for simple queries but performs like molasses once you start throwing real data at it.
Let me give an example of a situation where, to my knowledge, only SQLAlchemy of the Python (or Ruby) ORMs is really able to handle things elegantly, without gross hacks like "piggy backing."
Often you'll see a one-to-many relationship where you're not always interested in all of the -many side. For instance, you might have a users table, each associated with many orders. In SA you'd first define the Table objects, then create a mapper that's responsible for doing The Right Thing when you write "user.orders."
(I'm skipping connecting to the database for the sake of brevity, but that's pretty simple. I'm also avoiding specifying columns for the Tables by assuming they're in the database already and telling SA to autoload them. Besides keeping this code shorter, that's the way I prefer to work in real projects.)
users = Table('users', metadata, autoload=True)
orders = Table('orders', metadata, autoload=True)
class User(object): pass
class Order(object): pass
mapper(User, users,
properties={
'orders':relation(mapper(Order, orders), order_by=orders.c.id),
})
That "properties" dict says that you want your User class to provide an "orders" attribute, mapped to the orders table. If you are using a sane database, SQLAlchemy will automatically use the foreign keys it finds in the relation; you don't need to explicitly specify that it needs to join on "orders.user_id = user.id."
We can thus write
for user in session.query(User).select():
print user.orders
So far this is nothing special: most ORMs can do this much. Most can also specify whether to do eager loading for the orders -- where all the data is pulled out via joins in the first select() -- or lazy loading, where orders are loaded via a separate query each time the attribute is accessed. Either of these can be "the right way" for performance, depending on the use case.
The tricky part is, what if I want to generate a list of all users and the most recent order for each? The naive way is to write
class User:
@property
def max_order(self):
return self.orders[-1]
for user in session.query(User).select():
print user, user.max_order
This works, but it requires loading all the orders when we are really only interested in one. If we have a lot of orders, this can be painful.
One solution in SA is to create a new relation that knows how to load just the most recent order. Our new mapper will look like this:
mapper(User, users,
properties={
'orders':relation(mapper(Order, orders), order_by=orders.c.id),
'max_order':relation(mapper(Order, max_orders, non_primary=True), uselist=False, viewonly=True),
})
("non_primary" means the second mapper does not define persistence for Orders; you can only have one primary mapper at a time. "viewonly" means you can't assign to this relation directly.)
Now we have to define "max_orders." To do this, we'll leverage SQLAlchemy's ability to map not just Tables, but any Selectable:
max_orders_by_user = select([func.max(orders.c.order_id).label('order_id')],
group_by=[orders.c.user_id]).alias('max_orders_by_user')
max_orders = orders.select(orders.c.order_id==max_orders_by_user.c.order_id).alias('max_orders')
"max_orders_by_user" is a subselect whose rows are the max order_id for each user_id. Then we use that to define max_orders as the entire order row joined to that subselect on user_id.
We could define this as eager-by-default in the mapper, but in this scenario we only want it eager on a per-query basis. That looks like this:
q = session.query(User).options(eagerload('max_order'))
for user in q.select():
print user, user.max_order
For fun, here's the sql generated:
SELECT users.user_name AS users_user_name, users.user_id AS users_user_id,
anon_760c.order_id AS anon_760c_order_id, anon_760c.user_id AS anon_760c_user_id,
anon_760c.description AS anon_760c_description,
anon_760c.isopen AS anon_760c_isopen
FROM users LEFT OUTER JOIN (
SELECT orders.order_id AS order_id, orders.user_id AS user_id,
orders.description AS description, orders.isopen AS isopen
FROM orders, (
SELECT max(orders.order_id) AS order_id
FROM orders GROUP BY orders.user_id) AS max_orders_by_user
WHERE orders.order_id = max_orders_by_user.order_id) AS anon_760c
ON users.user_id = anon_760c.user_id
ORDER BY users.oid, anon_760c.oid
In SQLAlchemy, easy things are easy; hard things take some effort up-front, but once you have your relations defined, it's almost magical how it pulls complex queries together for you.
.................
I'm giving a tutorial on Advanced Databases with SQLAlchemy at PyCon in February. Feel free to let me know if there is anything you'd like me to cover specifically.