Using Relational Databases with Python

Andrew J Todd esq

http://www.halfcooked.com/

This work is made available under the GNU Free Documentation License

No Kitten Pictures Here

Two small boys

Introduction

Python
Databases
Python DB-API
Building on the DB-API
Non-relational Databases

Python

Agile Language
First released in the early 1990s
Extensive Standard Library
Low Level Storage Interfaces
Access to Databases from 3rd Party Modules

Best described as an Agile language, Python has been around since the early 1990s. Python has an extensive standard library but access to databases is available only in third party modules. The standard library has more low level storage interfaces like dbm, pickle and shelve. Note that some of the object database available for Python (ZODB and Schevo) build on these standard library modules.

Databases

Usually refers to relational databases
But really anything that adheres to the ACID principles
- Atomicity
- Consistency
- Isolation
- Durability

Atomicity : All of a transaction is committed or nothing is
Consistency : Either a transaction succeeds or if it fails everything is returned to its prior state
Isolation : A transaction in process must be isolated from every other transaction
Durability : Committed data is preserved

Python Database API

Version 1.0

http://www.python.org/dev/peps/pep-0248/
Released April 1996
Not often used any more

Version 2.0

http://www.python.org/dev/peps/pep-0249/
Released April 1999
Current and most popular version of the API

DB-API 2.0

Each database has its own module (or modules)
Standards for

Establishing a connection (and connection objects)
Cursor objects
Executing a statement
Fetching the results of executing a statement
Exception hierarchy

Set of optional extensions

The DB-API doesn't specify a common module for providing connectivity like Perl's DBI, rather it assumes that each database has its own module. But unlike PHP where each database driver implements its own often slightly different commands for interacting with the database in Python there is a level of consistency between modules.

Modules are available for most of the popular relational databases, be they F/OSS or commercial. If you have a database that you already use the chances are that there is a Python database module for it. Each DB-API module must provide at the very least a constructor and a number of module globals. It also acts as the owner of a number of standard exceptions.

Short Interlude

A demonstration of the DB-API

          from pysqlite2 import dbapi2 as sqlite
          db_connection = sqlite.connect('sample.db')
          dir(db_connection)

          db_curs = db_connection.cursor()
          dir(db_curs)

Fetching Data

All access through the cursor
Follows standard prepare-execute-fetch cycle
Insert and update statements have no 'fetch'
Can fetch one, many or all rows
Always returns a sequence
A side effect of an execute is to set the cursor description

Short Interlude

A demonstration of interacting with a Database

          create_stmt = "CREATE TABLE people (id INTEGER PRIMARY KEY, "
          create_stmt += "first_name VARCHAR(20), last_name VARCHAR(30), "
          create_stmt += "date_of_birth DATE)"
          create_stmt
          db_curs.execute(create_stmt)

          insert_stmt = "INSERT INTO people (first_name, last_name, "
          insert_stmt += "date_of_birth) VALUES "
          insert_stmt += "('David', 'Bailey', '1938-1-2')"
          insert_stmt
          db_curs.execute(insert_stmt)

          db_connection.commit()

          insert_stmt = insert_stmt[:-31]
          insert_stmt += "('Ansel', 'Adams', '1902-2-20')"
          insert_stmt
          db_curs.execute(insert_stmt)

          db_curs.execute("SELECT * FROM people")
          db_curs.description
          db_curs.fetchone()
          db_curs.fetchall()

Philosophy

A database driver specification
Not a one size fits all standard like JDBC
Designed to have higher level modules sit on top
Optional extensions which module authors may implement

Parameters

DB-API 2.0 supports database bind parameters
JDBC has them, PHP doesn't
DB-API 2.0 supports several 'styles'

qmark - ?
numeric - :1
named - :name
format - %s
pyformat - %(name)s

One of the trickiest things people new to databases and Python get into trouble with is bind parameters. If you've used JDBC or Perl then the concept will be familiar. PHP by default doesn't have bind parameters so PHP programmers coming to Python often have problems with them.

Because one of the specified parameter styles is misleadingly close to the standard Python string substitution syntax people often mistake one for the other and simply insert variable values into their query string before execution. This has a couple of drawbacks. One is that you are potentially vulnerable to SQL injection attacks, especially if you are taking the value of your variables from an untrusted source. Secondly you aren't giving the computer a chance to properly optimise your query because each time you execute will be treated as a separate statement in the database and be parsed again, adding to the cost of the operation. There is a solution to these issues.

The DB-API recommends that module authors use numeric, named or pyformat.

Short Interlude

A demonstration of the use of bind parameters

          stmt = "INSERT INTO people (first_name, last_name, "
          stmt += "date_of_birth) VALUES (?, ?, ?)"
          db_curs.execute(stmt, ('Partick', 'Anson', '1939-4-25'))
          db_curs.execute(stmt, ('Norman', 'Parkinson', '1913-4-21'))

          search_name = 'Bailey'
          stmt = "SELECT * FROM people WHERE last_name=?" 
          db_curs.execute(stmt, (search_name,))
          db_curs.fetchone()

Why 'format' must die

Spot the difference;

>>> stmt1 = "SELECT * FROM people WHERE name='%s'"
>>> stmt2 = "SELECT * FROM people WHERE name=%s"

>>> db_curs.execute(stmt1 % 'Bailey')
>>> db_curs.execute(stmt2, 'Bailey')

Building on the DB-API

Helper modules

dtuple
db_row

Object Relational Mappers

SQLObject
SQLAlchemy

SQLObject 0.7.2 was released this week

Non-relational Databases

Many are built on the standard pickle and shelve modules

ZODB - Zope Object Database
Schevo - From Orbtech
Durus - Lightweight version of ZODB

OSDC 2006