I’ve recently written some code to wrangle XML files. Part of the code validates a provided file against an XML Schema stored in a file. When I wrote this code I got tangled up in absolute and relative path manipulations trying to load the XML Schema file. Most of the Python file operations work relative to the current working directory and I needed to be able to load my XML Schema from a file relative to the application root directory. Regardless of where the code was executed from the schema file would always be up and across from the directory containing the Python module being executed. A picture will probably help.
load_source_file.py I need to load and parse the XML Schema contained in
Source_File.xsd. Here’s how I did it. First, we need to work out the root directory of the application relative to
load_source_file.py. After a few false starts this tip from StackOverflow was the key – http://stackoverflow.com/a/1271580/2661. The full path to our
etl directory is;
root_dir = os.path.abspath(os.path.dirname(__file__))
But we need to go up a directory so we use
os.path.split to remove the last component of the path.
root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))
The final part is simply joining this with the name of the directory and schema file that we wish to load. Then we have a directory that is the same irrespective of where we run code from. To make reading the code easier I split this across a few lines and ended up with.
>>> import os
>>> from lxml import etree
>>> root_dir = os.path.split(os.path.abspath(os.path.dirname(__file__)))
>>> schema_file = os.path.join(root_dir, ‘schemas’, ‘Source_File.xsd’)
>>> xmlschema.doc = etree.parse(schema_file)
>>> xmlschema = etree.XMLSchema(xmlschema_doc)