August 01, 2004
Towards a Consistent Database API
Python has a Database API standard, but as is frequently argued on the DB-Sig mailing list it's not quite standard enough.
What do I mean by this? The standard is flexible, essentially making each module which implements it subtly but meaningfully different. I've recently been trying to write some code which can seamlessly access different databases. The ambiguities of the DB-API make it difficult, if not impossible, to write one series of statements which will work against either database. In this case it has been Gadfly and MySQL, but writing code to work against more than one database will cause some degree of problems. These inconsistencies are, I think, one of the reasons that people have written abstraction layers like db_row, dtuple and PDO.
Typically these inconsistencies would be quite simple to fix, but require a little work on the part of module authors.
What do we need? Here is my list;
- A single, consistent, parameter style. Gadfly uses '?' and MySQLdb uses '%s'. We should pick one and stick with it. 
- Consistent behaviour. When you fetchone() result from a Gadfly cursor that is empty it raises an exception (gadfly.database.error). MySQLdb just returns 'None'. 
- Consistent use of exceptions. The DB-API specifies a standard set of exceptions, but doesn't say when they should be raised. What should happen if I attempt to execute an invalid SQL statement? Some modules will accept it and return nothing whilst others will raise an exception. With a bit of luck it might even be ProgrammingError.
I have probably missed some things from my list, so go wild in the comments if you think that there are other areas ripe for clarification. Who knows, I might even post this to the mailing list and see if we can't get some movement on these issues.
 - Life gets tricky when you consider named parameter styles, like the one used in cx_Oracle. I think that's the best way to write code but not enough module authors agree with me. JDBC, ODBC and the Perl DB:API all use '?' so that should probably be the Python standard as well.
 - In fact, MySQLdb tells you the number of rows affected as the return code of the execute, so that you don't need to even attempt the fetchone.
Posted by Andy Todd at August 01, 2004 03:59 PM
I think most of the weird issues -- like parameter style -- that are in the DB-API are due to the eclectic nature of the underlying drivers. MySQL, for instance, doesn't have parameters, so the parameters are implemented in the Python driver. And %s is easy to parse. Oracle drivers do support parameters, and they use ?. ? is probably best, but %s is easiest, hence incompatibility.
Database errors are harder, as the databases and drivers emit errors in whatever way they do, and if they don't match the standard then they are highly unlikely to be fixed -- instead the driver will be incompatible. Now, because the standard is vague, they can be "compatible".
The other issue people always bring up is that exceptions are named in a standard fashion, but they aren't standard actions. So you can't just catch any ProgrammingError, you must specify whether you are catching psycopg.ProgrammingError or MySQLdb.ProgrammingError.
The other thing which is wildly non-standard is how you can extend the way values are quoted. Psycopg has some functions, I believe PyGreSQL has even more ways to indicate quoting, but this is all non-standard.
My recommendations after being a subscriber to the DB-SIG list for a while:
0. Learn from existing standards like JDBC (and don't be tempted to fast-forward to building higher-level APIs until the lower-level ones are properly defined).
1. Dump the paramstyle "flexibility". Even Oracle go along with the JDBC standard and drop named parameters, and I imagine that only hardcore Oracle users really miss them (and they're probably not using JDBC).
2. Go with ? as the only parameter marker - %s looks too much like the standard substitution marker, and newcomers start to think that value substitution is just quoting.
3. Take inspiration from mxODBC which has probably the most consistent API and which has many of the database independence issues resolved.
Although idea to use always "?" maker is attractive, it's hard to implement. Note, that "?" is not substituted inside string literals (which are presented diferently in databases), so you do need SQL parser for each driver.
I wouldn't want ? substituted inside string literals - it's a parameter marker, remember. That's why the argument against %s is so strong, in my opinion.
- on ?: many C-APIs allow/require ? in prepared statements, the driver author doesn't has to parse the SQL then
- on named parameters: actually, Celko uses them a lot in his latest book about tree structures in SQL, because he passes the same value at several places in the SQL statements
- on exceptions: There is a recommendation that exception classes should be also accessible via the session object, except session.ProgrammingError would then be portable
- on consistent exceptions: I think that it's difficult because driver authors would have to test with several other drivers to assure that exceptions for the same error situation match. How about a exception factory as a parameter to the session object? I would have to write such a factory for every database my application wants to support, but I can keep the differences out of the remainder of the code.
Point taken about the ? stuff (I should have taken a few seconds to digest the point being made) - I guess that's why no decision was taken about standardisation before and why the various modules just go with whatever the underlying C libraries support. But would a full SQL parser be necessary?
If the underlying C library doesn't support ?, then a tokenizer is required. Actually, it's only needed to skip over comments, string literals and quoted identifiers, so you don't even have to know the rules for numeric literals or unquoted identifiers. Or is there a database that allows the ?: operator in expressions?