mikepk.com about current projects contact

A description of XML namespaces

May 25, 2006

XML namespaces are a pretty simple concept but one that's managed to cause quite a bit of confusion. I've heard some pretty crazy ideas about what a namespace is so I thought I'd do a quick write up to try and explain it simply. The w3c docs aren't terrible but I could see how they could be confusing.

So, what the heck are they?

Imagine you've written a nifty music composition program. You need a way to store and exchange the documents you create. XML is easily parsed, is a web standard, and allows you to create your own data markup. So you use XML and decide that there will be a tag called <note></note>.

Some time later you need to create another application for creating post-it note like annotations. XML worked so well for you before, you use it again and in the process you create a new tag called <note></note>.

All is right in the universe... Well until you decide you want to create a newer, more fantastic music composition program that allows people to leave themselves virtual post-it notes inside their virtual sheet music. When you try to re-use the code you'd written for the two applications by combining them, there are two meanings for the tag <note>. The names you've used are said to collide.

You could re-work the code and change one of the tags to be different (say 'postit' instead of 'note') but if someone else wanted to use your documents in their own application, there's no guarantee their XML markup wouldn't collide with yours. You could make your tags so complicated that they were highly unlikely to be the same, say <mikesgreatmusicprogramnote>, but that gets tedious fast, is tough to read, and it's still not a guarantee that you're not going to collide.

XML namespaces are a solution to this problem. By 'gluing' something thats known to be unique to each tag you create, you can guarantee that no one else will have the same tag. What can be used thats unique? Well URL's (well technically URI's) are unique, so the people who specified xml namespaces decided that a URI glued to a tag would solve the problem*. Pretty clever, except, most people are used to URI's pointing to web pages. What are these URI's supposed to point to? They don't have to point to anything, they just need to be unique. They can point to something (like say a document that describes how to use your tags) but they don't have to.

So prepending a URI to a user defined tag guarantees that no one else can have the same tag. We use a colon to 'glue' the URI to our tag but using URIs can present some problems. <http://grazr.com/musicprogram:note> Writing this is still tedious and URIs contain characters that produce invalid XML, like ampersands, slashes, and colons. XML namespaces get around these problems by using a 'prefix' which is a placeholder for the URI.

What you'll see in an xml document is something like the following declaration <xmlns:gzmusic="http://grazr.com/musicprogram">

( The parts are <xmlns:prefix=URI>)

Now whenever I want to use my namespaces, I use prefixes pointing to the correct URI and then I'm guaranteed not to collide with anyone else's tags (or even my own).

<xmlns:gzmusic="http://grazr.com/musicprogram"> <xmlns:gzpost="http://grazr.com/postitprogram"> [...] <gzmusic:note> <gzpost:note> </gzpost:note> </gzmusic:note> [...]

To the parser the prefix is like a variable declaration, it's replaced with the URI when parsed. If I used two different prefixes (say gzmusic and gzsymphony) but they were declared using the same URI, they would point to the same namespace resulting in gzmusic:note and gzsymphony:note being the same tag (which is technically illegal).

Another thing to note is that blocks of XML can have a default namespace. This just means that if you declare a namespace in a top level element, all the child elements (if not qualified by their own prefix) are assumed to be using that namespace.

[...] <!-- Use the HTML 4.0 namespace --> <html xmlns="http://www.w3.org/TR/REC-html40"> <head> <title>Greeting</title> </head> <body>Hello There!</body> </html> <!-- end of the default namespace ends with the closed html tag -->

All the tags within the 'html' block default to the namespace declared in that block. Frequently a document will have a single xmlns declaration at the very top level element of the document (such as an XHTML namespace) to define the default namespace for the entire document.

And thats XML namespaces. Seems simple enough, but I've heard enough people garble this or just be confused by it that I thought it was worthwhile to explain briefly.

At opml camp, I thought Pito did a good job explaining the idea. The fact that afterwards a few people were happy that the concept had been cleared up for them also made me think that a quick explanation was in order.

footnote: Really it doesn't have to look like a URL, it could be anything guaranteed to be unique like a URN, but it's most common manifestation is as a URI that looks like a URL