Use of XML namespaces in feeds

May 25, 2006

I gave a presentation on the second day of opml camp regarding "namespaces and their uses in feeds" and I received tons (ok two) e-mailed requests to put a synopsis of this material in a blog post so here goes. (for those confused by namespaces I'm going to follow this post with a quick description of what they are).

Methodology
I decided that since Grazr lives in an interesting spot, a nexus point for feeds people read, I would parse the Grazr cache and see what kinds of namespaces were actually being used. Even though the grazr cache is still a relatively small sample (compared to the universe of feeds out there), and granted probably skewed towards the geekier end of the spectrum, I figured it was probably good enough to be representative of "namespaces in the wild".

Some details on the anlysis

When parsed, the Grazr cache represented about 10,000 active feeds
There were just about 100 unique (URI or URN) namespace declarations
There were \~30,000 separate namespace declarations
While not every feed used namespaces, those that did usually used several
58% (around 6000) of the feeds used at least one namespace declaration

The next thing I did was to plot the namespaces versus percent of total declarations.

http://www.grazr.com/mikepk/namespace.png" style="width:100%; text-align:centered"/>

What you can see from the graph is that a few namespaces represent the vast majority of the declarations.

It's important to note that some of these feeds, like the heavily represented Atom namespaces, had feeds where each content block inside the feed was wrapped with a namespace declaration.

Another interesting thing to note is that, although technically the different URI's used for Atom type feeds mark them as separate namespaces, some were actually still intended to represent the same part of the Atom feed specification (blogger for instance uses it's own URI for Atom). If you combine all the separate "Atom" namespace declarations, they account for well over half of all the declarations.

The next most common namespace is XHTML which showed up as \~20% of all the declarations. The XHTML namespace is used frequently for content blocks represented in feeds.

If you continue to plot out all the namespaces, you get a very 'long tail' pattern of namespace use. This seems to indicate that there are very few common namespaces and that most namespaces are highly specific to a particular application with its associated subset of feeds. In other words, people defining a namespace for very niche-y applications that have little reusability.

A subset of the most frequently reused namespaces are

Dublin Core - commone metadata for 'author', 'date' and other similar publishing data
RDF - Resource description framework declarations, used for the semantic web
CC Schema - RDF representation of license and copyright information, 'creative commons'
taxonomy - RDF namespace for taxonomies
well formed web CommentAPI - namespace for comment feeds and comments
itunes - Apple music store namespace for podcasts, length, explicit and other rich media tags
feedburner - feedburner meta data
Yahoo! Media RSS - ratings, content, and adult type tags
trackback - namespace for movable type's trackback ping system. tags include 'ping' and 'about'

Another quick thing to note is that there are also a few namespaces that overlap or replicate funcitonality. For example, the Yahoo! media RSS namespace and the Apple iTunes namespace are both used for similar rich media content and therefore replicate many of the same metadata (like ratings, and adult/explicit content tags).

This was a quick presentation but hopefully understanding a little bit about how namespaces are actually being used 'in the wild' can help us frame our discussion for extending things like OPML into the future.