Wednesday, November 30, 2005

A point for ORM developers to remember

If you are writing an ORM tool, please keep this one point in mind:

I already know SQL.

Your new query syntax isn't better; it's just different. Which means one more thing for me to learn. Which means I probably won't bother, and will use instead an ORM tool that respects my time.

Consider: if your target audience is like me, and already knows SQL, this should go without saying. Resist the temptation to be "clever" and overcomplicate things.

If your target audience does not know SQL, then either

  • they intend to learn SQL, because they're using a relational database and expect that non-python code or ad-hoc queries will be necessary at some point, or
  • they have no intention of learning SQL, and all their data-access code will now and forever be in python, in which case they would be better off using Durus or ZODB or another OODB. Be honest with these people and everyone will be happier.

Django does some cool stuff, but heaven help me if I ever had to use an abomination like their ORM in its present form. It's not pretty. Look, here's a rule of thumb to tell when your code sucks: if it reminds the reader of perl, it sucks. There. The secret of not sucking is yours. Repent and suck no more!

At the risk of proceeding to beat a dead horse, didn't anyone look at those code samples and think, "wow, our ORM code is way the hell uglier than the vanilla SQL?"

But even if you discover the World's Prettiest And Most Functional Syntax, resist the temptation to make me use it. Remember: I already know SQL. (Corollary: don't bother giving SQL functionality a facelift, as in "startswith='WHO'" instead of "LIKE 'WHO%'.")

I only single out Django here because they blogged about how cool their ORM syntax for ORs was, which made me have a look, which prompted this rant. Sorry, Django fans!

(Java is clunky and ugly but the Java SimpleORM tool is better thought-out than anything I have seen in Python, and does not make this mistake. Read the author's whitepaper.)

26 comments:

xtian said...

Agreed - I think the ORM in Django is the only part that I felt consciously uncomfortable about when I was trying it out.

How do you feel about SQLObject querying?

Anonymous said...

I can appreciate your concern but on the other hand, if you feel more productive with SQL because you already know it, why on earth do you use an ORM product which of cousre will add an extra layer of complexity between your data and your application.

People swears only about ORM until they've tried them because IMO the very nature of ORM will always make them less efficient than having your own set of library to cook your SQL queries just for your needs.

I hate typing SQL queries, I feel really not at ease with them. But having used ORM for a while now, I wonder if it's even better.

As you say, why not using directly a real object database such as ZODB?

Anonymous said...

Django allows you to write SQL queries so I don't see the problem.

Anonymous said...

Maybe you could be interested in SQLAlchemy:

http://www.sqlalchemy.org/

I'm still experiment with it, but it seems the most powerful OO<->SQL mapper today available in Python (the rest of the so-called ORMs are more like simplicistic and extremely limited Class<->Table mappers).

Any thoughts?

Jacob said...

anonymous said:
> if you feel more productive with SQL
> because you already know it, why on earth
> do you use an ORM product which of cousre
> will add an extra layer of complexity
> between your data and your application.

I think most ORMs are meant to reduce (apparent) complexity. The question is, what price is paid for this. My goal would be to make the ORM dumb enough that the price is low. ORMs also help to organize code -- if you are going to create an Employee object anyway and give it a bunch of methods, using an ORM will probably reduce the number of lines of code you write.

PyDO also has a special query syntax, but it is a bit silly and raw sql is equally usable. The main reason I often use the special syntax is that then I am not embedding one syntax into another and as a result my editor will help me avoid typographical errors; but it is impossible to use it without thinking of the corresponding sql anyway. IMO, if you are using a SQL database, you should think SQL, period, whatever tools you are using.

Adrian Holovaty said...

> Corollary: don't bother giving SQL
> functionality a facelift, as in
> "startswith='WHO'" instead of
> "LIKE 'WHO%'."

I'm puzzled. Would you rather this be like="WHO%"?

Because if that's the case, you'd have a couple problems.

Different databases use different quoting schemes (e.g. the SQLite Python driver uses question marks, not percent signs), so you would either have to quote the percentage or convert all percentages to whatever quoting scheme the backend uses. And if you did that, you'd have to provide a way to use a "raw" percent sign in a query. Same goes for underscores.

Would inventing a quoting scheme and lowering one level of abstraction be worth it? Different strokes for different folks.

Jonathan Ellis said...

You're confused.

The % sign is a wildcard, not a substitution placeholder. All ANSI-compliant databases, including SQLLite, respect this. Even noncompliant databases like MySQL get this much right.

Re quoting per se, I have no idea what you're talking about. Perhaps your eyes tricked you -- I used doublequotes to denote quotations, and single quotes to denote string literals.

Tim Golden said...

I'm paid as a SQL Developer but I'm probably really 50% a Python developer. I pretty much feel the same way as you seem to: I want an ORM to make my life a bit easier than having to write repeated INSERTs/UPDATEs and faff around with parameter substitutions and so on. If it can make many-many relationships a bit slicker, so much the better.

But I can comfortably write large-scale multitable queries with fairly complex join conditions, and I've not yet found a Pythonesque alternative which doesn't seem clunky.

I'm currently assessing SQLAlchemy which seems to give you access to the several levels I need. I'm not sure yet.

Adrian Holovaty said...

Let me explain that better.

If you're going to let people use percent signs (as LIKE placeholders) and underscores (as LIKE placeholders), you're going to have to provide a way for people to use raw percent signs and underscores. And this behavior is *not* consistent. SQLite requires a "LIKE 'term%' ESCAPE '\'", where you define the escape character in an "ESCAPE" clause.

Anonymous said...

I could not agree with you more. I use sql directly in my application and have not used an orm since I'm very comfortable with SQL. The biggest two problems I faced:

1. When migrating from mysql to postgres. Postgres requires an 'AS' after the column name for instance:
mysql: select count(*) total from some_table
postgres: select count(*) AS total from som_table
I have hundred of queries in my system and this was a major pain in the ass. There were some other problems on how the data object is returned from the two drivers. but the AS was the biggest problem.

2. I miss not having something like catwalk to administor my tables. I was thinking of having a sqlobject only for the catwalk interface and keep my application as is.

Jonathan Ellis said...

I probably shouldn't bother replying, but

a) Adrian: the ESCAPE clause is optional in SQLLite. This behavior (both the presence of the ESCAPE clause, and its optional-ness) is ANSI-compliant. (SQLLite and PostgreSQL are the most religiously spec-compliant databases out there, so when you think you have found noncompliant behavior in one of those databases, you are usually wrong. Published limitations excepted, of course.)

b) anonymous: AS is not required by postgresql for a simple select, which you can easily verify with psql. I have no idea how you came to this conclusion. (PostgreSQL _does_ require subqueries used in the FROM clause to be aliased, but since (a) MySQL hasn't supported this sort of query at all until very recently and (b) when they added it, they also required the alias to disambiguate it, I don't think this is what you mean.)

In any case, trying to paper over noncompliant behavior in poorly behaved databases (*cough*MySQL*cough*) is orthogonal to the design of an ORM tool per se. Conflating the one with the other will only lead to poorly designed code.

Jim said...

Different strokes for different folks indeed. Writing SQL by hand is nice for power users (and as mentioned in a previous comment, django does let you drop into custom SQL whenever need be) but I've found that using an ORM cuts down on so much boilerplate.

And I quite like django's query syntax -- sql is a hairy beast that doesn't map to python well, but django's syntax makes it quite easy to do complex statements expressively. Tried the sqlobject way but didnt care for it..

Jonathan Ellis said...

I never said "ORMs are bad." I said, "ORMs shouldn't re-invent the wheel, and in so doing force users to learn something unnecessary."

jfb said...

I haven't had the misfortune of using Djano's ORM, but over the past few days I've been neck-deep in wrestling with SQLObject, and am very close to discarding the whole goddamn thing and writing bare SQL. Fighting my tools feels very wrong and unPythonic to me, but I just cannot express even a moderately interesting data model in any ORM I've tried without extreme pain.

I know! I'll write ANOTHER ONE!

Mike Watkins said...

I have to agree with the writer, stuff like:

polls.get_object(complex=(Q(question__startswith='Who') & (Q(pub_date__exact=date(2005, 5, 2)) | pub_date__exact=date(2005, 5, 6)))

Sure looks worse than the problem its meant to solve.

I've done a number of projects with ORM, SQLObject, and Cucumber(1); SQLObject I immediately took a liking too some time ago - define the schema in Python and go. Nice.

But its time to look at another option, Cucumber2, which won't appeal to those who use DB's other than Postgres, but for us enlightened folks (grin):

class TestTable(CucumberObject):
class a(str): pass
class b(int): pass
class c(float): pass

catalog = manager.catalog(TestTable, conditions=('b < 5', 'b > 2'))
for obj in catalog: print obj.id, obj.b

http://issola.caltech.edu/~t/cucumber2/

I like Cucumber2 on the surface very much and shall be taking it for a spin soon.

But... where it makes sense to do so... I much prefer using an object database, with Durus being my fav due to its simplicity.

http://www.mems-exchange.org/software/

Nicola Larosa said...

From the Twisted guys, Axiom.

"Axiom is an object database, or alternatively, an object-relational mapper."

Divmod : Axiom
http://divmod.org/trac/wiki/DivmodAxiom

For the history behind it:

What Happened to Atop?
http://divmod.org/trac/wiki/WhitherAtop

Jonathan Ellis said...

"Axiom currently supports only SQLite and does NOT have any features for dealing with concurrency."

I guess you gotta start somewhere, but perhaps being both an ORM tool AND an OODB is not the right place.

Anonymous said...

My dorky thing:
http://adamv.com/dev/python/superhash/
mostly just wraps turning table or queried rows into struct/dict like Python objects.

For single rows retreived from tables by ID, it will also generate the modified fields for the UPDATE, too.

A pretty thin layer, without any modeling to speak of, but I found it useful in the past.

The trick is it just wraps rows, and doesn't try to wrap complicated queries in any way. (Also, it's currently MySQL specific, but not TOO MySQL specific.)

- Adam Vandenberg
http://adamv.com/

Peter Fein pfein@pobox.com said...

Hear hear! I learned SQL a few months ago after much kicking and screaming. And while the ada-inspired syntax is sometimes annoying, I'm thrilled. Once you get your head around the idea that SQL is basically set theory, everything falls into place. I actually find it a more powerful, concise and understanble way of expressing some things than Python.

The only thing I'm unsastified with is not having a good, reusable way of building SQL statements in my scripts. We've got 4 developers and at least 8 ways of constructing an INSERT, and none of them really look like the SQL they produce (lots of loops & appends).

I don't want an ORM - converting to/from Python objects is pretty easy (esp. with Psycopg2's type adaptation). I just want something to help me build SQL statements. I've thought about using PTL templating framework as base, but who has time?

Could this text box be any smaller?

Anonymous said...

Have you looked at qlime?

It does try to maker persistence more transparent than SimpleORM, but that's easier done in Python.

Robert Brewer said...

polls.get_object(complex=(Q(question__startswith='Who') & (Q(pub_date__exact=date(2005, 5, 2)) | pub_date__exact=date(2005, 5, 6)))

Not learning another "language" is what prompted me to choose lambdas for expressing WHERE clauses in Dejavu. I started with overriding "+", "|", and other operators, and quickly realized the clunkiness of that--now, for the above, I write:

dates = (datetime.date(2005, 5, 2), datetime.date(2005, 5, 6))
f = logic.Expression(lambda x: x.question.startswith('Who') and x.pub_date in dates)
recall(polls, f)

Having to wrap it in an Expression object is a bit verbose, perhaps, but at least you write your query in pure Python. In addition, you don't need to write a parser or validator--Python checks the constraints for you when it compiles your code (and you get injection-proofing for free).

Alex Garel said...

Personally I quite like the idea of HQL (Hibernate SQL) which just put some object syntax in SQL, most of all removing join conditions.
SQL is really cool because it aimed at expressing sets operations (and not whole universe cases).
On the other hand I think it should be nice for an ORM to basically takes SQL syntax but extends it (say adding hierachical (xpath?) capabilities, more agregating possibilities...).
Some things are still hard to express with SQL (many joins, hierachical queries exists but with limitations, can't include a sub query anywhere,... )

Julio C├ęsar Carrascal said...

Why hasn't anyone done the KISS thing here and include SQL in the python syntax like the LINQ guys did with C#?

Fazal Majid said...

The database is the foundation for most real-world systems, and usually the bottleneck for performance and scalability. It thus stands to reason you have to choose very carefully how you interface with it. Most ORMs generate grossly inefficient (and sometimes even deadlock-prone) SQL.

Generic abstraction layers (as opposed to an application-specific set of stored procedures) will almost invariably come back to bite you when it hurts the most. People who complain about the "impedance mismatch" between Python/Java/Lisp/whatever and relational databases underestimate the importance of data and databases, the healthy respect of which they will eventually learn from bitter experience...

If you care about performance or scalability, stick with SQL.

Gheorghe Milas said...

I wrote a rather long blog comment about why I'm writing yet anoter python ORM out of frustration with the complexity yet lack of functionality in current python ORM's.

Read the blog here

David Creemer said...

Nice article, and I agree with your complaint. I've been using SQLObject for more than a year, and find myself doing a lot of this sort of thing:

"select id, .... from VERY COMPLEX QUERY"

then iterating over the id's to get the "objects" I need. This works around a few problems -- first, SQLObject's "SQLBuilder" syntax is -- well not SQL. It's not horrible, but as you say, one more thing to learn.

Also, depending on how the query is constructed, SQLObject might have to bypass it's own cache, which is a whole other level of complexity. Finally, the DRY principle simple can't work with complex SQLObject based applications -- I have dozens of triggers, pl-pgsql functions, complex constraints, geo-location and full-text search indexes, etc., and much of that is not expressible in SQLObject.

For my next complex application, I will probably avoid ORMs, and stick to things like the forthcoming SQL-API or the lower levels of SQL-Alchemy.