1st June, 2012

Python path relative to application root

Filed under: python — admin @ 3:53 pm

I’ve recently written some code to wrangle XML files. Part of the code validates a provided file against an XML Schema stored in a file. When I wrote this code I got tangled up in absolute and relative path manipulations trying to load the XML Schema file. Most of the Python file operations work relative to the current working directory and I needed to be able to load my XML Schema from a file relative to the application root directory. Regardless of where the code was executed from the schema file would always be up and across from the directory containing the Python module being executed. A picture will probably help.

Within load_source_file.py I need to load and parse the XML Schema contained in Source_File.xsd. Here’s how I did it. First, we need to work out the root directory of the application relative to load_source_file.py. After a few false starts this tip from StackOverflow was the key – http://stackoverflow.com/a/1271580/2661. The full path to our etl directory is;

root_dir = os.path.abspath(os.path.dirname(__file__))

But we need to go up a directory so we use os.path.split to remove the last component of the path.

root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))[0]

The final part is simply joining this with the name of the directory and schema file that we wish to load. Then we have a directory that is the same irrespective of where we run code from. To make reading the code easier I split this across a few lines and ended up with.

>>> import os
>>> from lxml import etree
>>> root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))[0]
>>> schema_file = os.path.join(root_dir, ‘schemas’, ‘Source_File.xsd’)
>>> xmlschema.doc = etree.parse(schema_file)
>>> xmlschema = etree.XMLSchema(xmlschema_doc)

10 Comments

  1. Any reason you prefer os.path.split()[0] over os.path.dirname()?

    Comment by Christos Georgiou — 01/06/2012 @ 5:27 pm

  2. Any reason you prefer os.path.split()[0] over os.path.dirname()?

    Comment by Christos Georgiou — 01/06/2012 @ 5:27 pm

  3. Yes, because it removes the last element of the path after we have figured out the directory that load_source_file.py is in. So the split turns /path/to/project/etl to /path/to/project which then allows me to append ‘schemas’ in the next line.

    Comment by Andy Todd — 01/06/2012 @ 7:33 pm

  4. Yes, because it removes the last element of the path after we have figured out the directory that load_source_file.py is in. So the split turns /path/to/project/etl to /path/to/project which then allows me to append ‘schemas’ in the next line.

    Comment by Andy Todd — 01/06/2012 @ 7:33 pm

  5. If your application uses Setuptools or Distribute, the preferred way of resource access would be pkg_resources. The schema would have to reside under some Python package, however. This would then become

    >>> from lxml import etree
    >>> from pkg_resources import resource_stream
    >>> xmlschema_doc = etree.parse(resource_stream(‘etl’, ‘Source_File.xsd’))

    To see if this suits your needs, see the documentation.

    Comment by Santtu Pajukanta — 02/06/2012 @ 1:13 am

  6. If your application uses Setuptools or Distribute, the preferred way of resource access would be pkg_resources. The schema would have to reside under some Python package, however. This would then become

    >>> from lxml import etree
    >>> from pkg_resources import resource_stream
    >>> xmlschema_doc = etree.parse(resource_stream(‘etl’, ‘Source_File.xsd’))

    To see if this suits your needs, see the documentation.

    Comment by Santtu Pajukanta — 02/06/2012 @ 1:13 am

  7. Hmm.. how about os.path.join(__file__, ‘..’) ? Much easier to understand: Take this file, and go up one directory. Or maybe I am misunderstanding something here…

    Comment by Martin — 02/06/2012 @ 4:51 am

  8. Hmm.. how about os.path.join(__file__, ‘..’) ? Much easier to understand: Take this file, and go up one directory. Or maybe I am misunderstanding something here…

    Comment by Martin — 02/06/2012 @ 4:51 am

  9. Martin, if I just use join(__file__, ‘..’) then I get a full path with a .. in it that doesn’t resolve properly.

    Comment by Andy Todd — 13/06/2012 @ 3:41 pm

  10. Martin, if I just use join(__file__, ‘..’) then I get a full path with a .. in it that doesn’t resolve properly.

    Comment by Andy Todd — 13/06/2012 @ 3:41 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress