Using Relational Databases with Python

Andrew J Todd esq

http://www.halfcooked.com/

This work is made available under the GNU Free Documentation License

No Kitten Pictures Here

Two small boys

Introduction

  • Python
  • Databases
  • Python DB-API
  • Building on the DB-API
  • Non-relational Databases

Python

  • Agile Language
  • First released in the early 1990s
  • Extensive Standard Library
  • Low Level Storage Interfaces
  • Access to Databases from 3rd Party Modules

Best described as an Agile language, Python has been around since the early 1990s. Python has an extensive standard library but access to databases is available only in third party modules. The standard library has more low level storage interfaces like dbm, pickle and shelve. Note that some of the object database available for Python (ZODB and Schevo) build on these standard library modules.

Databases

  • Atomicity : All of a transaction is committed or nothing is
  • Consistency : Either a transaction succeeds or if it fails everything is returned to its prior state
  • Isolation : A transaction in process must be isolated from every other transaction
  • Durability : Committed data is preserved

Python Database API

Version 1.0

Version 2.0

DB-API 2.0

The DB-API doesn't specify a common module for providing connectivity like Perl's DBI, rather it assumes that each database has its own module. But unlike PHP where each database driver implements its own often slightly different commands for interacting with the database in Python there is a level of consistency between modules.

Modules are available for most of the popular relational databases, be they F/OSS or commercial. If you have a database that you already use the chances are that there is a Python database module for it. Each DB-API module must provide at the very least a constructor and a number of module globals. It also acts as the owner of a number of standard exceptions.

Short Interlude

A demonstration of the DB-API

          from pysqlite2 import dbapi2 as sqlite
          db_connection = sqlite.connect('sample.db')
          dir(db_connection)

          db_curs = db_connection.cursor()
          dir(db_curs)
          

Fetching Data

Short Interlude

A demonstration of interacting with a Database

          create_stmt = "CREATE TABLE people (id INTEGER PRIMARY KEY, "
          create_stmt += "first_name VARCHAR(20), last_name VARCHAR(30), "
          create_stmt += "date_of_birth DATE)"
          create_stmt
          db_curs.execute(create_stmt)

          insert_stmt = "INSERT INTO people (first_name, last_name, "
          insert_stmt += "date_of_birth) VALUES "
          insert_stmt += "('David', 'Bailey', '1938-1-2')"
          insert_stmt
          db_curs.execute(insert_stmt)

          db_connection.commit()

          insert_stmt = insert_stmt[:-31]
          insert_stmt += "('Ansel', 'Adams', '1902-2-20')"
          insert_stmt
          db_curs.execute(insert_stmt)

          db_curs.execute("SELECT * FROM people")
          db_curs.description
          db_curs.fetchone()
          db_curs.fetchall()
          

Philosophy

Parameters

One of the trickiest things people new to databases and Python get into trouble with is bind parameters. If you've used JDBC or Perl then the concept will be familiar. PHP by default doesn't have bind parameters so PHP programmers coming to Python often have problems with them.

Because one of the specified parameter styles is misleadingly close to the standard Python string substitution syntax people often mistake one for the other and simply insert variable values into their query string before execution. This has a couple of drawbacks. One is that you are potentially vulnerable to SQL injection attacks, especially if you are taking the value of your variables from an untrusted source. Secondly you aren't giving the computer a chance to properly optimise your query because each time you execute will be treated as a separate statement in the database and be parsed again, adding to the cost of the operation. There is a solution to these issues.

The DB-API recommends that module authors use numeric, named or pyformat.

Short Interlude

A demonstration of the use of bind parameters

          stmt = "INSERT INTO people (first_name, last_name, "
          stmt += "date_of_birth) VALUES (?, ?, ?)"
          db_curs.execute(stmt, ('Partick', 'Anson', '1939-4-25'))
          db_curs.execute(stmt, ('Norman', 'Parkinson', '1913-4-21'))

          search_name = 'Bailey'
          stmt = "SELECT * FROM people WHERE last_name=?" 
          db_curs.execute(stmt, (search_name,))
          db_curs.fetchone()
          

Why 'format' must die

Spot the difference;

>>> stmt1 = "SELECT * FROM people WHERE name='%s'"
>>> stmt2 = "SELECT * FROM people WHERE name=%s"

>>> db_curs.execute(stmt1 % 'Bailey')
>>> db_curs.execute(stmt2, 'Bailey')

Building on the DB-API

Helper modules

Object Relational Mappers

SQLObject 0.7.2 was released this week

Non-relational Databases

Many are built on the standard pickle and shelve modules

Further Reading