November 26, 2004
Validating Relax NG with libxml2 and Python
Today's chore, validate that an XML document adheres to a schema defined in Relax NG. Using the libxml2 toolkit. Give this a try;
def fileRead(filename, attrib):
myFile = open(filename, attrib)
contents = myFile.read()
def isValid(schemaFileName, instanceFileName):
success = False
schema = fileRead(schemaFileName, 'r')
instance = fileRead(instanceFilename, 'r')
rngParser = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))
rngSchema = rngParser.relaxNGParse()
ctxt = rngSchema.relaxNGNewValidCtxt()
doc = libxml2.parseDoc(instance)
ret = doc.relaxNGValidateDoc(ctxt)
if ret == 0:
success = True
# Validation completed, let's clean up
del rngParser, rngSchema, ctxt
if libxml2.debugMemory(1) != 0:
print "Memory leaked %d bytes" % libxml2.debugMemory(1)
As usual, I'm liberally borrowing from the work of others. I defined my schema file after a quick skim through the Relax NG tutorial. The libxml2 documentation was worse than useless but google bought AMK and Dave Kuhlman to my aid.
There is a remarkable similarity between their code and mine, the working parts are theirs and the bugs are all mine. I'm posting my snippet because it is the simplest possible way I could find to validate my XML document against my schema and that's quite useful to me.
Posted by Andy Todd at November 26, 2004 08:50 AM
I don't suppose you've had any luck validating XML documents against RelaxNG compact schema using Python? I tried a while ago and ended up having to shell out to jing.
That's my next step Simon. But in AMK's page that I referenced above he uses jing as well. As far as I can tell libxml2 doesn't support the compact schema so you need another tool to do it.
Why not just use xvif or 4Suite to validate RelaxNG in Python. And if you want to use compact schemas, use my rnc2rng to touch your schemas up first.
See, for details:
Here is somebody who shows code that actually WORKS (I tried it), using libxml2.
If you suggest xvif (? never heard of) or 4Suite, then please also show code that works and that shows why your suggestion would be better.
My experiences with for example 4Suite are horrible, so I am happy with the example from Andrew using libxml2.
Challenge accepted. Code that WORKS on 4Suite:
Mr. Stuyvesant might not like it, but it works, and some do find it useful.
Re: XVIF, Google is your friend.
And indeed, thank you Uche!
I installed 4suite from ftp://ftp.4suite.org/pub/4Suite/
there is an .exe with Python2.4 in the name and installing is just a matter of running it.
Then I tried the examples on the link you gave, and with them I was able to create a Python function that takes XML and RelaxNG schema as input and returns 0 or 1. Great!
I don't know why you wrote "Mr. Stuyvesant might not like it": If I can install it and it is not too hard to use I usually like it a lot! The last experience I had with 4Suite is months old, back then it was not possible for me to install it in such a way that I could use it, now I can. Big improvement.
Minor nitpicks: on your examples page in the 2nd example use cmdline-parameter 2 rng-tut7.xml instead of rng-tut1.xml for the non-valid one.
Another thing is making the Windows.exe download for 4suite available via HTTP, not FTP since that is rather slow via Internet Explorer.
Anyway, good job, also lots of other XML-related modules; 4Suite in my toolbox too now!