I've spent the last 5 days diagnosing a failure of a particular document. I use xml/fo to generate word documents. Since the only code that exists to do this is in Java, I run a tomcat server that takes an http post from the seaside server and returns word document. It works fine until this one case pushed it over the edge.
First, tomcat has some 2M hard limit on post sizes that had to be configured around. Then I ran into decoding issues so I pitched tomcat and wrote a bog simple java server based on a server skeleton I found. I can now echo back what I send, regardless of size.
However, the code that does the translation is the next problem. This is all I get (the tomcat environment was swallowing this for some reason).
org.xml.sax.SAXParseException: character not allowed
at com.jclark.xml.sax.SAX2Driver.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
at com.xmlmind.fo.converter.o.if(Unknown Source)
at com.xmlmind.fo.converter.Driver.B(Unknown Source)
at com.xmlmind.fo.converter.Driver.main(Unknown Source)
Lovely. So which character out of the 1.2 million of them I have in this document might be the problem? I try the parser in Squeak. It likes the document fine. I scan the document for control characters and illegal utf8 sequences but I find nothing.
So I set out to download the source to this xml parser and run it in a debugger to try to figure it out. But I can't find all the pieces. There are apparently 472 xml parsers in the java world, along with 27 abstract interfaces to allow you to mix and match. Don't Java programmers have anything better to do than to write xml parsers? Why would anyone care enough to choose one over another? Each parser has about 6 versions - all mutually incompatible. Some require java 1.5, some 1.4, some are fine back to 1.2. Yet it is only the java parser that complains about the document, all other parsers I have access to like it fine.
Keeeeyyyyrrrriiiiissssst on toast! Tips for where I can just download the source code to the offending xml parser would be appreciated.