Who's broken?
I ran across a post by Tom about the perennially discussed problem of XHTML strictness vs forgiveness. I'm not quite sure which side of the fence he's on because he starts out with an interesting concept, XML and XHTML precision based on particular contexts, but then moves on to compare "forgiveness" in XHTML to other strict standards like writing and compiling C code and why it's a bad idea. I have a feeling I know where he falls though, since Tom has been exploring the big wide world of Semantic Web technologies and unambiguous specs are the lifeblood of that movement.
Contextual precision is an interesting idea although it does create a lot of complexity in implementation as well as the difficulty in surfacing that particular behavior (how do I know what level of precision this document has been parsed with?). In principle it's not a bad idea, things being formatted primarily for human visual consumption could be loosely parsed, more rigorous documents intended for machine use would be in a "strict" context. Although that assumes you know how people want to use your "visually parsed document". If someone later wants to try and tease data out of it, we're back to the same problem.
His post is a reaction to Shelly Powers, who states her position unambiguosly in favor of a "draconian/precise" interpretation instead of a more lax one. I especially feel her pain when she talks about the sorry state of cross-browser JavaScript.
Creating a JavaScript application without errors across browsers is an interesting challenge. If I had a voodoo doll labeled "IE" that I'd stick a pin into every time something would break in IE that works in all other browsers, it would look like a hedgehog now.
Unlike Tom, I am a heretic (OPML alone is a frequent whipping-boy for standards purists), but there is a problem with XML strictness on a fundamental level. It's most definitely a human problem and has nothing to do with the technology or the intent behind some of the strict philosophy (which I understand and appreciate the goals, even if we can't conform).
The main problem, as a developer of user facing XML technology based products, is the "who's broken" issue. If you are expected to consume XML "in the wild" that you have no control over (say, for example, feeds) and you adhere to the strict XML (and by extension XHTML philosophy) of refusing to process improperly formed documents, you will be perceived as broken.
Refusing to parse documents is technically the correct behavior according to spec, lauded by the relatively small group of people that understand the technology and philosophy, but the vast majority of users will not see an expected behavior or an error message and conclude that your software is broken. You can flail about, point accusingly at the source document, jump up and down, quote spec, yell till your red in the face, but in the user's eyes your application is broken.
In this sense blame rolls downstream and the source document is rarely perceived to be the problem. Compounding the problem, any other tool, more lax in processing documents provides the user an exemplar for them to point to and say "see they work, you're busted, ciao...". This is also a corollary to the "you get 2 seconds of attention per user" problem. Any friction in those 2 seconds and you've lost them.
You'd be amazed at the support e-mails we get telling us "we have a bug" or that "we're hoplessly broken" that, when followed up, the source XML is the worst mish-mash of poorly formed garbage or sometimes not even XML but the blame is always ours, "You need to fix your widget! It doesn't allow binary .gif files as a source!"
The strict XML purists will say, "So what! You make the web a better place if you refuse to process improperly formed documents!" While this may be true in the macro sense, it's essentially asking web product developers to fall on their swords, "taking one for the team", allowing users to assume they are broken and abandon them... until the collective of XML based technologies evolves to a higher state of awareness and interoperability. Sacrifice yourself for the greater good. I understand the philosophy and can see the beauty of a completely interoperable web, but convincing me to return error messages for unescaped ampersands isn't going to happen. Call us weak if you want, but product developers, especially in small companies, can't afford this kind of altruism.
Who's broken? I would argue that the feedback loop is broken for XML and XHTML. In a compiler environment the author knows immediately that his code is broken. The "compile-error-fix-compile" feedback loop is closed. In an open XML environment, the feedback loop is frequently open ended. The "xml author -> xml end user application -> end user" provides no clear feedback loop. Therefore it always results (to the users) that the end user application is broken if it fails to parse an XML document. Until that disconnect is fixed, applications will continue to be forgiving in how they parse things, and authors and tools will be sloppy in what they produce.