24th June, 2009

You can have your delimiter in any character, as long as it is comma

Filed under: python — admin @ 3:08 pm

On a recent project a number of interface files were defined as “ASCII encoded DOS files with CRLF (ASCII code 13 and 10 respectively) end of line markers and the field delimiter is | (Pipe character – ASCII code 124)” or in plain language, pipe separated values files.

Now the casual observer would think that these are the same as CSV files, but with a different delimiter. And you would almost be correct -as long as you aren’t relying on popular desktop productivity software to produce your files.

The reasons for this exact format are lost in the mists of time but these files are supposed to be easy to produce by anyone with a computer. The assumption was that most of the people producing these files would be running Microsoft Windows and Office.

This leads to the assumption that you can enter your tabulated data into Excel and save as a pipe delimited CSV right? Wrong. It is next to impossible to do this in Excel unless you change some system wide settings. To which my first response was WTF?

Luckily, Python came to my rescue and all I needed to do was this:

>>> import csv
>>> old_file = csv.reader(open('blah.csv'), dialect='excel')
>>> new_file = csv.writer(open('ANDY.csv', 'w'), delimiter='|')
>>> for record in old_file:
...     new_file.writerow(record)

Of course, this doesn’t help the average person in the street who doesn’t have my l33t Python skills. So I’ll be changing the interface format as soon as I get a chance.

Powered by WordPress