1st June, 2012

Python path relative to application root

Filed under: python — admin @ 3:53 pm

I’ve recently written some code to wrangle XML files. Part of the code validates a provided file against an XML Schema stored in a file. When I wrote this code I got tangled up in absolute and relative path manipulations trying to load the XML Schema file. Most of the Python file operations work relative to the current working directory and I needed to be able to load my XML Schema from a file relative to the application root directory. Regardless of where the code was executed from the schema file would always be up and across from the directory containing the Python module being executed. A picture will probably help.

Within load_source_file.py I need to load and parse the XML Schema contained in Source_File.xsd. Here’s how I did it. First, we need to work out the root directory of the application relative to load_source_file.py. After a few false starts this tip from StackOverflow was the key – http://stackoverflow.com/a/1271580/2661. The full path to our etl directory is;

root_dir = os.path.abspath(os.path.dirname(__file__))

But we need to go up a directory so we use os.path.split to remove the last component of the path.

root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))[0]

The final part is simply joining this with the name of the directory and schema file that we wish to load. Then we have a directory that is the same irrespective of where we run code from. To make reading the code easier I split this across a few lines and ended up with.

>>> import os
>>> from lxml import etree
>>> root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))[0]
>>> schema_file = os.path.join(root_dir, ‘schemas’, ‘Source_File.xsd’)
>>> xmlschema.doc = etree.parse(schema_file)
>>> xmlschema = etree.XMLSchema(xmlschema_doc)

10th November, 2011

Extracting a discrete set of values

Filed under: python — admin @ 3:42 pm

Today’s I love Python moment is bought to you by set types.

I have a file, XML naturally, the contains a series of transactions. Each transaction has a reference number, but the reference number may be repeated. I want to pull the distinct set of reference numbers from this file. The way I learnt to build up a discrete set of items (many years ago) was to use a dict and set default.

>>> ref_nos = {}
>>> for record in records:
>>> ref_nos.setdefault(record.key, 1)
>>> ref_nos.keys()

But Python has had a sets module since 2.3 and the set standard data type since 2.6 so my knowledge is woefully out of date. The latest way to get the unique values from a sequence looks something like this;

>>> ref_nos = set([record.key for record in records])

I think I should get bonus points for using a list comprehension as well.

19th August, 2011

Validating an XML File with LXML

Filed under: python — admin @ 8:35 pm

I’ve been playing with XML files recently and have on the odd occasion needed to validate a file against an XML schema. This is surprisingly easy using lxml, the Swiss Army knife of Python XML processing. Allow me to demonstrate.

>>> from lxml import etree
>>> schema = etree.XMLSchema(etree.parse('schema_file_name.xsd'))
>>> xml_file = etree.parse('xml_file_name.xml')
>>> schema.validate(xml_file)

Job done. If you are unlucky enough that your file doesn’t validate you can find out by checking the error_log attribute of your XMLSchema object.


23rd June, 2011


Filed under: General,ubuntu — admin @ 4:47 pm

Due to a recent accounting error (on my part and in my favour) I recently found myself in possession of a netbook. I know that makes me a luddite and I should have bought a tablet. Call me a throwback. In my defence it was half the price of an iPad and a lot more practical for me. The major deal breaker for me is that iPad’s don’t come with a command line client and can’t (to the best of my knowledge) run the only editor worth having. Also, iPad’s don’t run free software and that is becoming more important to me. So I bought a netbook.

As it came with Windows installed my first task was to install a decent operating system. I’m a fan of Xubuntu so I grabbed the latest release and then … stopped. Because my first thought was to burn the Xubuntu .iso file to a disk and install from that, but my netbook doesn’t have a CD drive. I’ve never installed from anything else in the past so I was a bit stuck.

The good news is that it is 2011 and Google came to the rescue. After a couple of false turns, and via Pendrivelinux.com, I found the rather wonderful LinuxLive USB Creator. Whilst it isn’t an exhaustive test, and don’t come to me with your problems, I simply installed and started LiLi, pointed it at my USB stick and the .iso file I had downloaded and 10 minutes later I had a bootable copy of Xubuntu.

Some words of praise, too, for the (X)ubuntu installer folks who have made getting their operating system on a new machine a complete breeze. Thanks everyone, top job.

Now all I’ve got to do is install all of the software that I rely on, configure the thing and I can start using it. At my pace that should only take a week or two. I’ll be back then.

24th November, 2010

Use the right tool for the job

Filed under: General — admin @ 1:28 pm

I was going to write an informed and opinionated piece about the use of proper tools in corporate IT departments. In particular I was going to say that I found it interesting that smaller, more cost conscious teams (in startups or open source projects) use more modern and sophisticated tools for issue management, project planning and code management than the big IT departments that I have the pleasure to work in.

But, well, I’ve got to go and write a status report showing the break down of issues by status, and that is going to take me about three and a half hours. So I don’t have time to faff about on my blog.

Instead, I’ll just paraphrase JWZ (who was apparently in turn paraphrasing an older comment about sed) and say;

Some people, when confronted with a problem think “I know, I’ll use a SharePoint list.” Now they have two problems.

I mean, a SharePoint list for issue management? When we could use Jira or FogBugz? I give up.

26th October, 2010

Gerald release 0.4.1

Filed under: database,python — admin @ 11:45 am

Before starting on some of the big changes planned for version 0.5, and thanks to patches and suggestions from various people I’ve addressed a couple of issues with Gerald 0.4. This means that we now have release Gerald 0.4.1

What’s new in this release? Not much, just some bug fixes, documentation changes and (hopefully) an egg that is installable on all platforms. The .egg files available from PyPI (and soon to be available on SourceForge) should install without any errors and if my testing is correct will be usable on multiple platforms including Windows.

Downloads are available at the PyPI page and the SourceForge project page. As always, please send me an email with any problems or suggestions for improvement.

20th August, 2010

Python strftime reference

Filed under: python — admin @ 9:23 am

Take a look at this excellent single page web site – Python strftime reference. It does exactly what it says on the tin. Good work.

27th June, 2010

Gerald release 0.4

Filed under: database,oracle,python — admin @ 3:36 pm

I’ve been revelling in the Python goodness this weekend at PyCon Australia. This has motivated me to fix the last couple of issues and then package and release Gerald 0.4

What’s new in this release? The most important changes are fixes to a number of issues identified by users of SQLPython. Gerald was appearing to take a long time to collect large schemas but was actually failing silently. I added test cases to show the problem and then fixed the code. This shouldn’t happen any more.

I applied a couple of patches supplied by Catherine Devlin to cope with columns without defined lengths and to not get DBA objects in Oracle schemas.

I slipped in some new features as well; I implemented the to_xml and compare methods on the CodeObject class, and Gerald now supports views in MySQL (as long as you are running 5.1 or above).

Finally, I changed the project documentation to use Sphinx.

Downloads are available at the PyPI page and the SourceForge project page.

If you find any problems or want to contribute any code just send me an email.

1st June, 2010

Generating HTML versions of reStructuredText files

Filed under: General,python — admin @ 1:37 pm

I wanted to quickly and easily convert a series of reStructured text documents into HTML equivalents. For reasons too dull to discuss here I couldn’t just use rst2html.py and didn’t want to go to the trouble of remembering enough bash syntax to write a shell script.

So I thought that as long as docutils is written in Python it would only take a moment or two to knock up a script to do what I needed. Well yes, and no. The script itself is fairly simple;

from docutils import core

def convert_files(name_pattern):
    for file_name in glob.glob(name_pattern):
        source = open(file_name, 'r')
        file_dest = file_name[:-4] + '.html'
        destination = open(file_dest, 'w')
        core.publish_file(source=source, destination=destination, writer_name='html')

The most useful line being the one where I call core.publish_file. But it wasn’t immediately obvious from the docutils documentation what series of incantations would achieve my desired results. Luckily, after some time spent perusing the documents I came across this dissection of rst2html.py. This, in turn, lead me to the description of the Docutils Publisher, which lists the convenience functions available to work with the engine.

The end result isn’t particularly elegant but it does get the job done and I thought I would share it in case anyone else has a similar need in the future.

14th February, 2010

Gerald release 0.3.6

Filed under: database,python — admin @ 2:06 pm

I have just released version 0.3.6 of Gerald. Gerald is a general purpose database schema toolkit written in Python.

This release was at the request of the sqlpython project and contains only one change. A new convenience method connect has been added to the Schema class. This enables a schema to be initiated and then later have a database connection associated with it. Because this changes the public API of gerald I’ve released this under a new version number.

Development, bug and issue tracking and the project wiki are available on the project Trac site. Source code and distribution files are available at the sourceforge page.

The next release will be 0.4. Exactly what will make up that release is still evolving, although it is likely to feature SQL Server support as I have just started a new job and all of the systems there use it. To see what else is in the release and to track progress take a look at the version 0.4 roadmap.

« Previous PageNext Page »

Powered by WordPress