Glyph Lefkowitz cites me as an inspiration to write Why Axiom Doesn't Expose SQL. Alas, I disagree with most of what he says.
My post was about how if you're writing a tool that presupposes the use of a relational database, it's stupid to try to protect your users from having to know SQL. (This also means I think projects that bend over backwards to pretend ALTER TABLE is too hard are misguided, as well. But that's another subject.)
Glyph's first argument is that any form of SQL is an invitation to sql injection attacks. This particular form of scare mongering isn't appreciated. Come on: this is 2005. It's ridiculously easy to write injection-proof SQL, even by hand. Arguing that allowing SQL allows injection attacts is like arguing that coding in python allows "shutil.rmtree('/')": correct, but irrelevant.
Glyph further claims that "interfaces should be complete things," and that this justifies trying to obliterate any trace of SQL from your ORM. Because if you need SQL, your interface just wasn't complete enough.
The thing is, an ORM isn't just an interface: it's an extra layer of abstraction. Abstraction that obscures or prevents access to its internals smacks of the kind of we-know-better-than-you thinking that infects most statically typed language libraries. Don't do this.
Trying to design for "everything the user could possibly want, or at least everything we think he should want," while easier than providing for the unexpected, is almost always doomed to failure. Glyph gives the example of the Python os module -- why use C when os gives you everything you want? -- and yet, my current work project involves multiple C modules. If the Python designers had taken the attitude of "we'll give you all the everything we think you need, and if we leave out something you want, too bad," Python would have been a non-candidate for the job, and I'd be working somewhere else.
Fundamentally, tool designers more concerned about providing power and usability know that sometimes you need to go to a lower level of abstraction, and design for this up-front. Just like Python.
I do agree completely with Glyph on one point:
[M]ost ORMs are really, really awful. They are heavy on the "object" and not so much on the "database". There are a lot of features that SQL provides which they don't expose.
It sometimes seems like we're rapidly approaching the point where there will be as many half-assed Python ORM tools as there are half-assed Python web frameworks. Don't build another half-assed tool. Pick an existing project that sucks less than the others and improve it. One ORM that doesn't suck much (I mean that as a compliment) is PyDO2. Another is SQLAlchemy. I've been trying SQLAlchemy out during the last few days, and I really like the direction it's going in. (Don't be too intimidated by the introduction: it looks overcomplicated, but it's actually exceptionally well thought-out.)
Comments
As far as it's "ridiculously easy" to write injection-proof SQL - it's almost as easy to write buffer-overflow-proof C. Yet, here we are, it's 2006 now, and otherwise bright people are still writing C code with buffer overflows in it and web-based relational database access code with tons of SQL injection vulnerabilities.
In the article you cite about "we-know-better-than-you thinking", you say that code generation is "fragile at best". Why is SQL code generation less fragile than Python or C# code generation? If you were writing a program that would generate Python code (however strict its quoting rules), I would, rightly, object that it opened the door to shutil.rmtree attacks. Similarly, if you are actually writing your code in SQL, rather than generating SQL from some other language, I suspect it would not be so vulnerable to attacks.
All I'm saying is that you should anticipate the unexpected by writing a non-crappy ORM, not writing a crappy one and letting your users patch it up by executing their own SQL. The C modules that you're using, I assume, aren't interfaces to POSIX functionality, but new functionality in their own right. The equivalent in an ORM is providing graceful access to stored procedures. That doesn't mean that you should be generating the stored procedures (or for that matter, C functions) from your application code.
As to code generation, you're conflating two issues:
1) runtime generation is much less fragile than compile-time for some areas, such, as, well, database introspection
2) generation of code that doesn't need to be hand-modified afterwards (most SQL) is inherently more robust than generation of code that does (most Python or C#)
Finally, for C modules, we did indeed have to write our own wrapper for some POSIX pthread functions in order to get decent condition variable performance. Sometimes that 1% of code you leave out of your tool that does 99% of what users need is the dealbreaker... but every user will have a different 1%.
"[D]enying derived classes full access to your inner workings just leads to clumsier (less readable and more bug-prone) implementations derivations... [I]t's based on the hubris that you are a better programmer than anyone who might want to extend your class and can forsee all circumstances in which it might be subclassed."
The application to ORM tools that try to keep you away from SQL is, I think, clear.
The comparison with C and memory management isn't entirely apropos, I think. With C, you are forced to manage memory everywhere; it pervades everything you do, and is usually irrelevant to everything you really want to be thinking about while writing an application. An ORM that permits you to use SQL directly doesn't force you to use it everywhere; and when you do use it, the reason for doing so is likely to be relevant to your purpose in writing the application, so you'll think about it as carefully as you know how.
"everything the user could possibly want, or at least everything we think he should want" should be part of the Twisted motto. The rest could be:
* in the core package (serial line support)
* with some bizzare, non-standard-compliant implementation (https)
* while reimplementing half the Python standard library b/c it's synchronous and we're smarter (logging)
What did you expect from Glyph?
(/troll)
That said, Twisted's a great library.
That said, I've found the most compelling reason for using an ORM is to automate tasks where you need don't know ahead of time what your query will look like.
I've found that in many of those instances, simple methods that map dictionaries to queries and back are quite sufficient, like dict2insert or dict2update.
But I still love SQLObject. :)
sqlobject-admin manages the differences between the code's expectations and the actual database. So you can record versions, upgrade a database (via hand-written SQL), create a database from scratch, and see some of the differences between the database and the expected schema.
This isn't reinventing the database DDL, but simply a tool that addresses fairly universal issues. You could write the same thing with DDL parsers and whatnot, or the databases could build these sorts of tools in. But DDL parsers are hard and uncommon, and AFAIK databases don't include these kinds of tools.
As for SQL injection, the blame lays heavy on the shoulders of PHP. They just got it horribly wrong, so that doing the right thing looks *almost* like doing the very wrong thing, and so that the wrong thing is easier. That's just dumb design, I don't think it really implies much about larger concepts in SQL and database abstraction. It does imply a lot about what your failure cases should be, but that's a different discussion.
I'll stop trolling for just a moment here to point out another vector of agreement that's probably non-obvious from the way I've been talking about this. Of course your ORM should let you get at SQL, just like Python should let you get at C.
If you've written a crappy ORM that needs patching up, the last thing your users are going to want to do is to crack open your source code to start diddling things around so they can get to the SQL, already. As the python community motto goes, "we're all adults here". Axiom does have a way to get at your SQL, "Store.executeSQL". (And by the way, that method will actually raise an exception if you attempt to insert SQL strings by hand rather than using bind parameters ;-))
My main point here is that every time a user has to go and call that method, I should recognize it as a failure of my code and attempt to fix it (as I believe other ORM authors should), rather than just accepting "Oh, all my users should know SQL, so if you want to access feature XYZ of course you have to generate some SQL strings yourself".
From the little I've seen SQLAlchemy does a good job of exposing both - a relational wrapper as well as an OR mapper. Otherwise if you have a pure OR mapper, the users will always have to compromise on control, expressivity and/or performance. Of course, the benefit of gaining persistent objects may justify the use in many cases.
I like sqlobject-admin for beeing an easy tool to make me worry less about how to setup the correct database from my ORM modell, +1.
One of a the nice things to have would be a tool that helps you convert one schema of a database to another mostly automated. Would quite take a bit of the edge out of changing schemas.
http://www.securityfocus.com/bid/15912
http://www.securityfocus.com/bid/13620
http://www.securityfocus.com/bid/13513
http://www.securityfocus.com/bid/16020
http://www.securityfocus.com/bid/16039
http://www.securityfocus.com/bid/13030
http://www.securityfocus.com/bid/16038
http://www.securityfocus.com/bid/16097
http://www.securityfocus.com/bid/15088
http://www.securityfocus.com/bid/15676
http://www.securityfocus.com/bid/14618
http://www.securityfocus.com/bid/15598
http://www.securityfocus.com/bid/13649
http://www.securityfocus.com/bid/13643
http://www.securityfocus.com/bid/15330
http://www.securityfocus.com/bid/13355
http://www.securityfocus.com/bid/15787
Mostly from 2005. There's lots more. I got tired of pasting links.
Solved problem? No. No it isn't.
Real-life SQL Injection vulnerabilities:
http://www.securityfocus.com/bid/15912
http://www.securityfocus.com/bid/13620
http://www.securityfocus.com/bid/13513
...
Solved problem? No. No it isn't.
OK, now paste one that isnt solved by using bind parameters properly.
Solved problem ? Yes. Yes it is.
First, any ORM must be able to know what it can cache, what it can't, and how everything stands in the face of transactions. Exposing SQL can make this difficult.
Second, Axiom does a lot of automatic table munging on the fly (schema upgrades, entity references, List objects) that would be almost impossible with exposing of SQL.
In the end, I think that Glyph probably missed the mark a bit. There are interesting objecty things that can be done with Axiom. There is enough of an interface to the relational model in the Axiom's queries and references that you can still take advantage of the underlying SQLs speed.
That said, implementing something in Axiom generally would make it almost impossible to properly munge directly through SQL because of how complex its data management is (especially across versions).
There is also serious responsibility that comes with keeping both the coding API and SQL storage implementation solid across versions. This is because Axiom uses SQL very differently from most ORMs. Look at powerups, for example. Nobody else does this as elegantly.
This is not a problem with Axiom, its a strength of it. It implements a ODB with some of the speed and power of a RDB. It couldn't do it if you were mucking about with the internals.
One other nicety of not exposing SQL is that it makes it fairly simple to port to another database. If someone dropped the right amount of money, I bet you could see Axiom running on MySQL, PostgreSQL, or Oracle in a matter of days. Interestingly, not a single Axiom derived application would break.
Anyone who has watched the efforts to port any large SQL-based system to another database knows the trouble this saves. I, for one, believe that anything that can be expressed in vanilla SQL can be represented with an appropriate API under any database system. SQL is really too simple to make this difficult. It's one of SQL's strengths.
Axiom doesn't much expose SQL for a these very good reasons. Preventing the underskilled from creating SQL injection attacks is a useful as a side effect. Tts focus of this discussion is likely an unnecessary distraction. The real reason that Axiom doesn't expose it is simply because it makes Axiom too complex to be usefully maintained and it makes a number of very useful things far too difficult.
But you are right that this can be a useful tool.
We need a declarative model to operate with objects.
And Bruce Eckel agree with me :).
And once it would be created we get object advantages from these queries so for example querying
[x for x in employee if x.age>25]
will not depend on whether age is a stored field or a property (today()-self.birthdate)
SQL in ORM is a kind of embedded assembler in the high level languages: the current compromise between comfort and performance