Andrew Channels Dexter Pinion

Wherein I write some stuff that you may like to read. Or not, its up to you really.

May 24, 2004

Validating an XML document against its DTD

Well, after the disappointment of the dead tree documentation I still wanted to validate my XML document against its DTD.

Luckily, there is a Python Cookbook recipe that shows us how to achieve it with xmlproc.

<Update> If you don't want to perform your validation in Python (although I can't think why myself) check out the XMLStarlet command line XML tookit </Update>

<Update (2)> Of course, the truly hardcore will use xmllint. It's part of libxml2 and is therefore officially Mark Pilgrim friendly. </Update (2)>

Posted by Andy Todd at May 24, 2004 02:18 PM

Comments

Use a decent text editor[1], instead of this Vim rubbish. It'll validate XML against a DTD on the fly. ;-)

[1] http://www.jedit.org/

Posted by: Simon Brunning on May 24, 2004 03:14 PM

Well, I guess it's a little bit of overhead to
distribute a JDK and jedit with your Python
application ;-)

Here's how to do validation with strings only.
I tried to add this to the Python Cookbook but
ActiveState currently has problems with its
database.


from xml.parsers.xmlproc import xmlproc, xmlval, xmldtd
from xml.parsers.xmlproc.utils import ErrorRaiser, ErrorPrinter
from cStringIO import StringIO
import sys

class DummyApp(xmlproc.Application):
def handle_start_tag(self, name, attrs):
pass
def handle_end_tag(self, name):
pass
def handle_data(self, data, start, end):
pass
def handle_comment(self, data):
pass

class InputFactory:

def create_input_source(self, sysid):
return StringIO(sysid)

def validate(xml_string, dtd_string):

d = xmldtd.load_dtd_string(dtd_string)
p = xmlval.XMLValidator()
p.set_application(DummyApp())
p.set_error_handler(ErrorRaiser())
p.set_inputsource_factory(InputFactory())
try:
p.parse_resource(xml_string)
except:
print sys.exc_type, sys.exc_value
return False

return True

Posted by: Nils Kassube on May 24, 2004 03:43 PM

Simon, you know I'm hardcore. jEdit is out because (a) it doesn't run in a terminal window and (b) it's not available for Debian.

Bleeding pointy-clicky-java-monkeys.

Nils, thanks for the tip, I'll check it out and post my results.

Posted by: Andy Todd on May 24, 2004 05:58 PM