Specificity

January 8, 2007

Danny Ayers, my favorite semantic web blogger, has an interesting reaction to a post by Seth Ladd, who writes: "Is Usefulness Inversely Proportional to Specificity?".

From Seth's post:

A successful metadata format for the web must be easily understood by humans, even if it is to be useful for machines. RDF and OWL seem to have the machine in mind first. This is opposite of what makes the Web so successful and useful. OWL doesn’t appreciate vagueness, nor does it allow for expressing “sort of” and “almost” clarifications, both of which are vitally important if any true meaning is to be conveyed.

Microformats punt on this issue, and I give them credit. Just tell me the terms to use, and I’ll use them. Let us all share these simple terms. These terms’ meanings are vague to a computer, but clear enough to a human. And a human is the one that will program the logic to handle the terms, so “good enough” seems to be just fine here.

The discussion is basically about whether RDF's specificity is part of the reason it hasn't gained as much traction as compared to other "looser" technologies like microformats.

Danny makes the distinction that, while both technologies can be described as means of providing meta-data, they're not really competing technologies. Microformats, if properly transformed (using things like GRDDL as I understand it), can be unambiguously converted into rdf.

...This might lead to the assumption that it was one approach versus the other. The truth is quite the opposite. Microformat data can be unambiguously defined, and it can be expressed in the RDF model. A HTML link can be backed by formal semantics (with the aid of a link to a HTML Meta data profile URI in the doc's head). With the GRDDL mechanism in place, as far as the Semantic Web is concerned, microformat data is RDF.

While I don't doubt Danny, I don't think this invalidates the original point. Higher specificity, in both the framework of the data model, and by the designer of the model itself, I think fundamentally limit usefulness (in the broader sense). There are several interconnected negative feedback loops, difficulty of understanding, limited adoption, "strictness" of the framework, etc that reduce both the users of the data, and the amount of data available. It's a sort of chicken and egg problem, if there's no data available, why should I learn to use these tools. If the tools are too hard to understand, why should I bother if there's no data.

I think this also brings up another interesting point that maybe the key to bootstrapping the semantic web is intermediate, vaguer, human readable formats like microformats that are then unambiguously converted via machines. The argument I've often heard from semantic web proponents when I say that RDF is difficult to understand and work with is that tools should be generating it, not people. My understanding of this has always been developer tools, but perhaps the key is transparent tools that operate behind the scenes converting "people friendly" formats into "machine friendly" formats. A sort of "just in time semantic web compiler" if you will.

I've thought about specificity and data formats a lot, mainly because OPML is continuously criticized as being poorly specified. While I won't say I haven't felt the pain of having to work around divergent interpretations, I find it to be immediately useful with data already available (via directories and feed reader subscription lists). I actually think, that more open ended interpretation allows easier adoption (even if it causes headaches and problems in the long term). More adoption means more data, which in turn, results in greater overall usefulness.

(Via Raw.)