November 10, 2011

Extracting a discrete set of values

Filed under: python — Andy Todd @ 3:42 pm

Today’s I love Python moment is bought to you by set types.

I have a file, XML naturally, the contains a series of transactions. Each transaction has a reference number, but the reference number may be repeated. I want to pull the distinct set of reference numbers from this file. The way I learnt to build up a discrete set of items (many years ago) was to use a dict and set default.

>>> ref_nos = {}
>>> for record in records:
>>>     ref_nos.setdefault(record.key, 1)
>>> ref_nos.keys()

But Python has had a sets module since 2.3 and the set standard data type since 2.6 so my knowledge is woefully out of date. The latest way to get the unique values from a sequence looks something like this;

>>> ref_nos = set([record.key for record in records])

I think I should get bonus points for using a list comprehension as well.

August 19, 2011

Validating an XML File with LXML

Filed under: python — Andy Todd @ 8:35 pm

I’ve been playing with XML files recently and have on the odd occasion needed to validate a file against an XML schema. This is surprisingly easy using lxml, the Swiss Army knife of Python XML processing. Allow me to demonstrate.


>>> from lxml import etree
>>> schema = etree.XMLSchema(etree.parse('schema_file_name.xsd'))
>>> xml_file = etree.parse('xml_file_name.xml')
>>> schema.validate(xml_file)
True

Job done. If you are unlucky enough that your file doesn’t validate you can find out by checking the error_log attribute of your XMLSchema object.

 

October 26, 2010

Gerald release 0.4.1

Filed under: database,python — Andy Todd @ 11:45 am

Before starting on some of the big changes planned for version 0.5, and thanks to patches and suggestions from various people I’ve addressed a couple of issues with Gerald 0.4. This means that we now have release Gerald 0.4.1

What’s new in this release? Not much, just some bug fixes, documentation changes and (hopefully) an egg that is installable on all platforms. The .egg files available from PyPI (and soon to be available on SourceForge) should install without any errors and if my testing is correct will be usable on multiple platforms including Windows.

Downloads are available at the PyPI page and the SourceForge project page. As always, please send me an email with any problems or suggestions for improvement.

August 20, 2010

Python strftime reference

Filed under: python — Andy Todd @ 9:23 am

Take a look at this excellent single page web site – Python strftime reference. It does exactly what it says on the tin. Good work.

June 27, 2010

Gerald release 0.4

Filed under: database,oracle,python — Andy Todd @ 3:36 pm

I’ve been revelling in the Python goodness this weekend at PyCon Australia. This has motivated me to fix the last couple of issues and then package and release Gerald 0.4

What’s new in this release? The most important changes are fixes to a number of issues identified by users of SQLPython. Gerald was appearing to take a long time to collect large schemas but was actually failing silently. I added test cases to show the problem and then fixed the code. This shouldn’t happen any more.

I applied a couple of patches supplied by Catherine Devlin to cope with columns without defined lengths and to not get DBA objects in Oracle schemas.

I slipped in some new features as well; I implemented the to_xml and compare methods on the CodeObject class, and Gerald now supports views in MySQL (as long as you are running 5.1 or above).

Finally, I changed the project documentation to use Sphinx.

Downloads are available at the PyPI page and the SourceForge project page.

If you find any problems or want to contribute any code just send me an email.

June 1, 2010

Generating HTML versions of reStructuredText files

Filed under: General,python — Andy Todd @ 1:37 pm

I wanted to quickly and easily convert a series of reStructured text documents into HTML equivalents. For reasons too dull to discuss here I couldn’t just use rst2html.py and didn’t want to go to the trouble of remembering enough bash syntax to write a shell script.

So I thought that as long as docutils is written in Python it would only take a moment or two to knock up a script to do what I needed. Well yes, and no. The script itself is fairly simple;

from docutils import core

def convert_files(name_pattern):
    for file_name in glob.glob(name_pattern):
        source = open(file_name, 'r')
        file_dest = file_name[:-4] + '.html'
        destination = open(file_dest, 'w')
        core.publish_file(source=source, destination=destination, writer_name='html')
        source.close()
        destination.close()

The most useful line being the one where I call core.publish_file. But it wasn’t immediately obvious from the docutils documentation what series of incantations would achieve my desired results. Luckily, after some time spent perusing the documents I came across this dissection of rst2html.py. This, in turn, lead me to the description of the Docutils Publisher, which lists the convenience functions available to work with the engine.

The end result isn’t particularly elegant but it does get the job done and I thought I would share it in case anyone else has a similar need in the future.

February 14, 2010

Gerald release 0.3.6

Filed under: database,python — Andy Todd @ 2:06 pm

I have just released version 0.3.6 of Gerald. Gerald is a general purpose database schema toolkit written in Python.

This release was at the request of the sqlpython project and contains only one change. A new convenience method connect has been added to the Schema class. This enables a schema to be initiated and then later have a database connection associated with it. Because this changes the public API of gerald I’ve released this under a new version number.

Development, bug and issue tracking and the project wiki are available on the project Trac site. Source code and distribution files are available at the sourceforge page.

The next release will be 0.4. Exactly what will make up that release is still evolving, although it is likely to feature SQL Server support as I have just started a new job and all of the systems there use it. To see what else is in the release and to track progress take a look at the version 0.4 roadmap.

February 1, 2010

Weird easy_install Behaviour

Filed under: python — Andy Todd @ 3:41 pm

Dear lazyweb, I unsubscribed from the distutils-sig mailing list a while back and consequently I’m not up to date with the latest to-ings and fro-ings. But, I have a problem. As reported by someone today Gerald eggs won’t install on Windows.

Everything is fine on my Ubuntu virtual machine, but on my shiny new work laptop I have Python 2.6 and today I downloaded and installed setuptools version 06.c11. When I try and install Gerald I get an error complaining about a lack of a setup.py file;

(TEST) C:\Work\virtualenvs\TEST>easy_install gerald
Searching for gerald
Reading http://pypi.python.org/simple/gerald/
Reading http://halfcooked.com/code/gerald/
Reading http://sourceforge.net/project/showfiles.php?group_id=53184&package_id=109623
Reading http://sourceforge.net/projects/halfcooked/files
Best match: gerald 0.3.5
Downloading http://sourceforge.net/projects/halfcooked/files/gerald/0.3.5/gerald-0.3.5-py2.6.egg/download
Processing download
error: Couldn't find a setup script in c:\docume~1\andy~1.tod\locals~1\temp\easy_install-woqly0\download
(TEST) C:\Work\virtualenvs\TEST>

The only thing that I can find different is that my Ubuntu virtual machine is running version 0.6c9 of setuptools. Has the function changed between two release candidates?

Needless to say this means that Gerald won’t install under Windows using easy_install until I figure this out. All help and suggestions warmly received.

January 21, 2010

Gerald release 0.3.5

Filed under: database,General,python — Andy Todd @ 9:40 am

This last weekend I released version 0.3.5 of gerald.

The major component of this release was to add a ‘User’ class to the oracle_schema module. This is similar to the ‘Schema’ class but whilst that shows all of the objects a database user owns the ‘User’ class contains details of all of the objects they can access, including those owned by other database users. This was requested by the sqlpython project to enable them to use gerald for database introspection.

The only other change was to ensure that the NotImplementedError exception is raised in all of the super type methods that are just stubs. This is mainly in the Schema.py module and thus meant that I had to add a set of tests for this module.

Development, bug and issue tracking and the project wiki are available on the project Trac site. Source code and distribution files are available at the sourceforge page.

The next release will be 0.4. Exactly what will make up that release is still evolving. To see what is in the release and to track progress take a look at the version 0.4 roadmap.

November 25, 2009

Gerald release 0.3.1

Filed under: database,python — Andy Todd @ 3:25 pm

Everyone, say hello to version 0.3.1 of gerald. This is a minor update that fixed some issues introduced in release 0.3 In summary these are:

  • Ticket 17 – Views have been converted to dictionaries from tuples
  • Ticket 18 – Reading an Oracle sequence updates it’s current value
  • Ticket 19 – Postgres primary keys were not represented properly when read from the database

Development, bug and issue tracking and the project wiki are available on the project Trac site. Source code and distribution files are available at the sourceforge page.

The next release will be 0.3.5 and will introduce the concept of a ‘User’. This is similar to a ‘Schema’ but will reference all of the objects a database user can see even if they don’t own them. You can track progress for the release using the version 0.3.5 roadmap.

Next Page »

Powered by WordPress