Python path relative to application root

I've recently written some code to wrangle XML files. Part of the code validates a provided file against an XML Schema stored in a file. When I wrote this code I got tangled up in absolute and relative path manipulations trying to load the XML Schema file. Most of the Python file operations work relative to the current working directory and I needed to be able to load my XML Schema from a file relative to the application root directory. Regardless of where the code was executed from the schema file would always be up and across from the directory containing the Python module being executed. A picture will probably help.

Within load_source_file.py I need to load and parse the XML Schema contained in Source_File.xsd. Here's how I did it. First, we need to work out the root directory of the application relative to load_source_file.py. After a few false starts this tip from StackOverflow was the key - http://stackoverflow.com/a/1271580/2661. The full path to our etl directory is;

root_dir = os.path.abspath(os.path.dirname(__file__))

But we need to go up a directory so we use os.path.split to remove the last component of the path.

root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))[0]

The final part is simply joining this with the name of the directory and schema file that we wish to load. Then we have a directory that is the same irrespective of where we run code from. To make reading the code easier I split this across a few lines and ended up with.

>>> import os >>> from lxml import etree >>> root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))[0] >>> schema_file = os.path.join(root_dir, 'schemas', 'Source_File.xsd') >>> xmlschema.doc = etree.parse(schema_file) >>> xmlschema = etree.XMLSchema(xmlschema_doc)

Extracting a discrete set of values

Today's I love Python moment is bought to you by set types.

I have a file, XML naturally, the contains a series of transactions. Each transaction has a reference number, but the reference number may be repeated. I want to pull the distinct set of reference numbers from this file. The way I learnt to build up a discrete set of items (many years ago) was to use a dict and set default.

>>> ref_nos = {} >>> for record in records: >>> ref_nos.setdefault(record.key, 1) >>> ref_nos.keys()

But Python has had a sets module since 2.3 and the set standard data type since 2.6 so my knowledge is woefully out of date. The latest way to get the unique values from a sequence looks something like this;

>>> ref_nos = set([record.key for record in records])

I think I should get bonus points for using a list comprehension as well.

Validating an XML File with LXML

I've been playing with XML files recently and have on the odd occasion needed to validate a file against an XML schema. This is surprisingly easy using lxml, the Swiss Army knife of Python XML processing. Allow me to demonstrate.

>>> from lxml import etree

>>> schema = etree.XMLSchema(etree.parse('schema_file_name.xsd'))

>>> xml_file = etree.parse('xml_file_name.xml')

>>> schema.validate(xml_file)

True

Job done. If you are unlucky enough that your file doesn't validate you can find out by checking the error_log attribute of your XMLSchema object.

Freedom

Due to a recent accounting error (on my part and in my favour) I recently found myself in possession of a netbook. I know that makes me a luddite and I should have bought a tablet. Call me a throwback. In my defence it was half the price of an iPad and a lot more practical for me. The major deal breaker for me is that iPad's don't come with a command line client and can't (to the best of my knowledge) run the only editor worth having. Also, iPad's don't run free software and that is becoming more important to me. So I bought a netbook.

As it came with Windows installed my first task was to install a decent operating system. I'm a fan of Xubuntu so I grabbed the latest release and then ... stopped. Because my first thought was to burn the Xubuntu .iso file to a disk and install from that, but my netbook doesn't have a CD drive. I've never installed from anything else in the past so I was a bit stuck.

The good news is that it is 2011 and Google came to the rescue. After a couple of false turns, and via Pendrivelinux.com, I found the rather wonderful LinuxLive USB Creator. Whilst it isn't an exhaustive test, and don't come to me with your problems, I simply installed and started LiLi, pointed it at my USB stick and the .iso file I had downloaded and 10 minutes later I had a bootable copy of Xubuntu.

Some words of praise, too, for the (X)ubuntu installer folks who have made getting their operating system on a new machine a complete breeze. Thanks everyone, top job.

Now all I've got to do is install all of the software that I rely on, configure the thing and I can start using it. At my pace that should only take a week or two. I'll be back then.

Use the right tool for the job

I was going to write an informed and opinionated piece about the use of proper tools in corporate IT departments. In particular I was going to say that I found it interesting that smaller, more cost conscious teams (in startups or open source projects) use more modern and sophisticated tools for issue management, project planning and code management than the big IT departments that I have the pleasure to work in.

But, well, I've got to go and write a status report showing the break down of issues by status, and that is going to take me about three and a half hours. So I don't have time to faff about on my blog.

Instead, I'll just paraphrase JWZ (who was apparently in turn paraphrasing an older comment about sed) and say;

Some people, when confronted with a problem think "I know, I'll use a SharePoint list." Now they have two problems.

I mean, a SharePoint list for issue management? When we could use Jira or FogBugz? I give up.