Andrew Channels Dexter Pinion

Wherein I write some stuff that you may like to read. Or not, its up to you really.

May 21, 2004

Python and XML

I have to admit that I really don't like this book, published by O'Reilly and authored by Christopher A Jones and Fred L Drake Jr. There are a few reasons, the prose is pedestrian, the examples not very useful and the structure somewhat confusing.

Taking my last point first, the book gives an explanation of both Python and XML (good), outlines (far too briefly) the available toolkits. Then it explains that there are two different approaches to manipulating XML documents and attempts to explain each (primarily SAX and DOM). In quite a few cases the book reads like a precis of the official (or definitive) specification without enough detail to be worth reading. At these points there is generally a pointer to an (unreadable in my opinion) W3C specification.

A weakness of the book is that it doesn't at any point suggest what each approach's strengths and weaknesses are. Nor, really, do they attempt to address what XML is best used for, or what Python adds to the mix.

There are a number of implicit assumptions, I think, in the book. The first and most significant is that you approach it with a specific task or problem domain in mind. If you are searching for answers to "why" type questions (why should I use SAX? for instance) you will be sadly dissapointed. The how questions are addressed though.

The single thing that would improve this book though is some decent examples. My preference is for meaningful, incremental examples throughout technical books like this. They are hard to do though. But how hard would it be, for instance, to mention that XML is often used for configuration (e.g. in wxPython) and suggesting a few different ways of building and parsing these type of files?

Finally, and I'm sorry that this is so negative, the book is just plain wrong in a number of places. Today I wanted to find a way of validating an XML file against it's DTD. I looked it up in the book and the subject is mentioned in the index. Turning to page 151 I see a section on validating DTDs. There is no explanation of the principle though, just a suggestion that you use a utility script which is part of the xmlproc parser that ships with the PyXML utilities. Except it doesn't. There is no xvcmd.py shipped with version 0.8.3 (the current release) and frankly I can't be bothered looking in previous releases. This is also the only mention of xmlproc in the book, which is strange because it it the first thing mentiond on the PyXML home page, but obviously not important enough to feature in the book at all.

My advice? If you want to learn about XML and the ways it can be written, read and manipulated go to XML.com and if you want to work with XML in Python you can't beat the effbot's ElementTree toolkit. Which naturally isn't mentioned in the book.

Posted by Andy Todd at May 21, 2004 07:34 PM

Comments

I don't mean to argue with your conclusion, because I agree, the book was not very useful to me in learning to process XML with Python. But I think that fairness demands I point out that the book was published in 2002, which means it was written in 2001. Back in 2001, Python support for XML was nowhere near as well-rounded as it is today.

Also, the "why" questions are perhaps not relevant to the book's scope, as its focus is laudably narrow. I don't want to know the "why"s of XML from a book on Python's XML processing tools, I want the "why"s to be answered by something like "XML in a Nutshell", or better yet "Learning XML".

On the other hand, the fact that the book "outlines (far too briefly) the available toolkits" is a damning disgrace, and one with which I agree. I remember reading the book and thinking to myself, "How come I didn't learn anything useful?" And that's a sad state of affairs for a technical book. Even admitting that Python and XML weren't best buddies at the time, I still think that Uche Ogbuji's columns (developerWorks, xml.com, etc.) provide a *much* better introduction than did this book. Which is sad, really, considering the amount of effort that goes into publishing a book compared to that which goes into publishing articles on a website.

Posted by: Peter Herndon on May 21, 2004 08:43 PM

Just for the record, I think ElementTree came out after the book was published...

Posted by: Hans Nowak on May 21, 2004 10:38 PM

Do you think this would be a good topic for an O'Reilly "Developer Notebook"?

http://devnotebooks.oreilly.com/

Would you want it authored by a veteran XML'er, a standard veteran Pythonista, or ???.


Posted by: DeanG on May 23, 2004 04:13 AM

Hrrm. I'm of two minds: if you are going to do it as a "Developer Notebook", I'd suggest getting a veteran Pythonista who *hasn't* done too much Python & XML work, so that you can get that "just discovered" flavor.

Otherwise, get Uche Ogbuji to do a second edition of the original book, but do it the right way -- a total rewrite. I don't know of any Pythonista with more XML experience than Uche, and his writing style is (to my taste) excellent.

Posted by: Peter Herndon on May 24, 2004 04:38 PM

Just with changes in the parser scene this book, written in 2002, isn't going to hold up well.
It does have a catchy (or catch-all) title.

I get so much done with ElementTree, but it was the improvements of Nov 2003, while keeping the interface simple, that really made it worthwhile. If you've only seen the previous versions, look again.

Posted by: Brian Mahoney on May 25, 2004 07:36 PM