May 9, 2008

Trouble Getting a Date

Filed under: database, python — Andy Todd @ 8:04 am

I’m having trouble with dates. This can be summed up in a couple of high level issues;

1. Date support in relational databases is insane, or at the best inconsistent.

As far as I can tell the ANSI SQL-92 standard defines date, time, interval and timestamp data types. Which doesn’t help when SQL Server only implements something called ‘datetime’ - at least I think so, have you tried accessing any sort of manual for a Microsoft product online? Blimey, I thought billg had embraced this web thing years ago. Oracle has the ‘date’ data type (which is actually a time stamp) and MySQL, well they’ve gone and outdone everyone by implementing DATETIME, DATE, TIMESTAMP, TIME, and YEAR.

2. The Python DB-API does not cope with date data type ambiguity well.

When it comes to the date question the Python DB-API states (and I quote) ” … may use mx.DateTime”, which if you ask me isn’t much of a standard. This needs to change so that all DB-API modules return consistent datetime objects, not such a big issue as datetime has been part of the standard library since, what, Python 2.3?

Sadly even if we fix this it won’t work with Sqlite as it doesn’t consistently support data typing. In my experiments regardless of what sort of date you insert into the database you get a unicode string back. Don’t believe me? Try this in Python 2.5;

>>> from sqlite3 import dbapi2
>>> db = dbapi2.connect('test_db')
>>> cursor = db.cursor()
>>> cursor.execute('create table date_test (id integer not null primary key autoincrement, sample_date DATE NOT NULL)'
>>> stmt = "INSERT INTO date_test (sample_date) VALUES (?)"
>>> cursor.execute(stmt, (1234, ))
>>> import datetime
>>> cursor.execute(stmt, (datetime.date(2008, 3, 10), ))
>>> cursor.execute(stmt, ('My name is Earl', ))
>>> db.commit()
>>> cursor.execute("SELECT * FROM date_test")
>>> results = cursor.fetchall()
>>> for item in results:
...     print item[1], type(item[1])
1234 
2008-03-10 
My name is Earl 
>>>

But note that it is fine for integers.

3. The people writing the Python standard library modules are on crack.

Outside of the database world and within the batteries included Python standard library some modules use datetime, others time and there are even uses of calendar.

O.K. I’ll accept that maybe the module authors aren’t on full strength crack, because the time module just exposes underlying posix functions. But the people who wrote those were on something strong and hallucinogenic. I table the following function signatures from section 14.2 of the Python Library Reference 2.5 as an example;

strftime(format[, t ])
strptime(string[, format ])

This has bitten me twice in the last twenty four hours and frankly I’m not happy.

I appreciate that there are historical reasons for having inconsistent function signatures but can someone please fix this in Python 3.0. All we need is a single module that can access the underlying system clock and then convert between a number of different representations of that and other epoch driven dates. How hard can it be? As far as I can tell this is not part of the proposed standard library re-organisation. I think it should be.

April 22, 2008

Whither Relational Databases?

Filed under: database — Andy Todd @ 9:16 pm

Following on from a theme that Simon has been pursuing here is an interesting piece - How SimpleDB differs from a RDBMS. A thorough analysis of SimpleDB, but I think the extra value here is in the comments. I particularly liked Greg Jorgensen’s submission that programmers just don’t like RDBMS because they take some learning. Whilst I don’t have empirical evidence to back up this supposition I can say that most Java programmers I’ve come across go slightly green if you suggest that they can solve most problems with a SQL statement (and yes, that was meant to be read ironically).

If I can sum up the message of this post and it’s comments it is that we should be thankful for having different tools available to us, because this isn’t a one size fits all world. Where you’ve got a big list of simple things ™ tools like BigTable and SimpleDB work well. Where you’ve got large pieces of unstructured data (sometimes referred to as ‘documents’) you can use CouchDB, and where you have complex, structured data that has to adhere to certain validation and usage rules use a relational database. Each of these will store up to terabytes of information so let’s not even talk about (the myth of) scalability. Choose the right tool for the job and stop insisting that every problem is a nail.

So to answer my question from the title of this post - still around, and still kicking arse.

January 20, 2008

Gerald release 0.2.2

Filed under: database, python — Andy Todd @ 7:11 am

I have packaged and released version 0.2.2 of gerald. You can find all of the details on the project page but in a nutshell the changes from the last release are;

  • A serious bug in the packaging script has been fixed and the distributed archive is now complete and valid.

Bug reports, patches and test cases to the usual email address please. Or just a message to let me know you are using the code would be nice.

December 4, 2007

Indexing isn’t hard

Filed under: database — Andy Todd @ 7:45 am

I found this blog post via Reddit and was happy to see the positive benefits the author is getting from profiling his SQL and indexing his database. He could have avoided the slow running query in the first place by following Andy’s simple rules of indexing;

  1. If your database doesn’t do it automatically, add indexes to all of your primary key columns
  2. Add indexes to all of your natural key columns
  3. Add indexes to all of your foreign key columns

When primary or foreign keys are composites (i.e. contain more than one column) create an index containing all of the columns in the same order that they are defined in the corresponding constraint.

If none of this makes any sense and you are responsible for your application’s database give me a call, my consulting rates are very reasonable.

September 30, 2007

Gerald release 0.2.1

Filed under: database, python — Andy Todd @ 4:01 pm

I have packaged and released version 0.2.1 of gerald. You can find all of the details on the project page but in a nutshell the changes from the last release are;

Bug reports, patches and test cases to the usual email address please. Or just a message to let me know you are using the code, that would be nice.

September 20, 2007

Insanity

Filed under: database — Andy Todd @ 5:29 pm

Today I’ve come across a new definition of insanity. It is this;

Expecting an update of every row in a 13 million row table (which is subject to frequent updates) to complete in one pass without timing out.

Of course, this wouldn’t be necessary if the system in question was running a version of PostgreSQL that was released after 2003. Damn those risk averse system administrators.

I should probably explain. In versions of PostgreSQL before release 8 you cannot increase the size of a column. Therefore to increase the size of one of the columns in our production database we have to add a new column, copy the contents of the old column to the new column, rename both columns and then drop the old column. Now that is what I call a database refactoring.

September 12, 2007

Much SQL Goodness

Filed under: database — Andy Todd @ 8:49 am

I rarely link to other blogs, but when I come across something that I think that you, dear readers, would appreciate I feel it would be churlish not to share. I’ve recently started reading Jeff Smith’s SQL Server Weblog and it is a veritable cornucopia of great information.

All of it is good solid advice about how to design, build and work with your database. I even forgive him using the wrong database because everything he writes is of such high quality.

This article alone will come in useful whenever I have to explain the concept of primary keys to monkey coders who only understand object models.

Thanks Jeff, keep up the good work.

July 12, 2007

In Search of Technologists

Filed under: General, database, python — Andy Todd @ 5:52 pm

In my new executive position I’m responsible for maintaining adequate staffing levels in the technology department of my company. We are pretty much staffed up at the moment but there are some potential projects on the horizon that will tax the capacity of my team. In a novel fit of forward planning I’m trying to find a consulting partner who will be able to help us in these times of high workload. It is proving quite tricky.

So, dear lazyweb, do you know of any reputable consulting companies who will fit our bill. They must be;

  1. On the ground in Australia. A Sydney office is a bonus.
  2. Have competencies in one or more of; Python, PostgreSQL and scaling internet applications
  3. Have a track record and reputable references. I will be checking.
  4. Not cost the earth. Although I’m very aware of how much consultants cost, having been one myself for most of the last fifteen years.

Any and all suggestions are welcomed in the comments or to my email address. As long as they fit our criteria. Outsourcing and offshore development are not an option at this time so please don’t suggest these.

March 7, 2007

Gerald Release 0.2

Filed under: database, python — Andy Todd @ 9:24 am

I’ve packaged and released version 0.2 of gerald. You can find all of the details on the project page but in a nutshell the changes from the last release are;

  1. A test suite - using py.test
  2. Support for more database objects - views, sequences, stored code objects, indexes, etc.
  3. A large number of bug fixes. Too many to mention here.
  4. Use of my dburi module for database connectivity

There’s probably more but I didn’t keep a comprehensive change register whilst I was tinkering with the code, an oversight I’ve already addressed for the inevitable 0.2.1 release.

This code has been fairly extensively tested and exercised in the real world but I’d appreciate any feedback, bug reports or patches.

December 14, 2006

Dont Use Default Roles in Oracle Databases

Filed under: database, oracle — Andy Todd @ 9:39 am

The admonition (since the Oracle 6 days if my memory serves) for good DBAs is to never use the default roles when granting privileges to users in Oracle databases. Always grant users the specific privileges they require and never rely on ‘CONNECT’ or ‘RESOURCE’, because they generally grant more rights than you really want.

Oracle seems to have addressed this in 10g2, at least according to this this blog post and my observation. Now, granting the ‘CONNECT’ role is exactly the same as explicitly granting the ‘CREATE SESSION’ privilege and the ‘RESOURCE’ role has a more reasonable list of privileges.

I’d still be wary of them though, because through sheer laziness I granted these roles to a user in my development database and then got an ORA-01031 error when I tried to create a view. That’s right, the ‘RESOURCE’ role doesn’t (and as far as I can tell never did) contain the privilege to create a view. How strange.

So this advice would appear to still be valid - only grant those privileges that your users need and do it explicitly.

Next Page »

Powered by WordPress