I'm building a program to check the general level of "wrongness" of the html in a web page. The formal definition of this is a DTD from the W3C. I've got a nifty parser built that seems to successfully build a DOM out of even the most hideous html and diagnoses things like unclosed tags and such as it goes. This gets me the general structural-ness stuff.
But for things like "you can't follow a table tag with a form tag" I'll have to go to the DTD. Turning a DTD into a useful data structure in memory has proven to be bloody challenging (although I think I just cracked it a couple minutes ago).