The document web

Back in 1998 Tim Berners-Lee published a W3C style guide with the title Cool URIs don't change. Probably everyone reading this has read that but if not there's no better time than now.

Cool URIs was a plea for web developers to design HTTP URIs for persistence and addressed many of the potential pitfalls regularly encountered en route (site redesigns, URIs mapped to file and folder structures, URIs exposing technology stacks that change over time etc).

By this stage in web history most developers and designers (or most of the ones I know) have experienced what goes wrong when URIs change. Users follow links and get 404s or out of date information from unmaintained pages, people's bookmarks break, inbound links break, search engine equity built up over years from inbound links leaks all over the carpet. Uncool URIs don't make the web die, but they do make it whimper slightly. From a less self-interested perspective broken links also break accountability. If your site contains something people disagree with / object to, they can write their own page and link to / cite your original piece. If your URI changes it's not just a link lost to the web, it's a link lost in the chain of accountability and public discourse.

But, but, but in real life URIs do change. Usually because it's not just the developers and designers who have a say. URIs have become part of the furniture of the real world, like corporate graffiti tags. I'm typing this on a tube train and every poster at this end of the carriage features a URI in some shape. There are URIs read out on radio programmes, plastered across the sides of buses, on mugs, on t-shirts, on beers mats, on screen at the end of TV programmes. URIs are common currency and almost everywhere outside the rarified sphere of web developers and designers they're seen as labels, not identifiers. And labels change. If the world were run by a benevolent cabal of web devs I'm pretty sure that 95% of URIs would be cool. Unfortunately it's not and important people occasionally demand changes. Which usually results in web devs attempting to wrangle redirect files with the complexity of a less reader friendly Ulysses.

I don't want to go too deep into the ever popular URIs should be human readable arguments because it always ends up going in circles and no-one ever agrees. But in the context of persistence human readable URIs are a problem because labels change. And some things have different labels in different cultural contexts (before you even get to different languages). The current canonical example amongst my circle of friend is the ermine. Or the stoat. People in the UK call some species of animal a stoat. People in the US call the same thing an ermine. Or the other way round. This leads to edit wars in Wikipedia between Brits and Yanks. And because Wikipedia URIs reflect article titles, the URIs change. And change again. Which isn't cool. And because DBpedia URIs reflect Wikipedia URIs if you're using DBpedia URIs as identifiers in your own systems things break. Or at least whimper.

The data web

These days lots of websites (and not just the usual web2.0 suspects) make content available as data for consumption by machines. In 99% of cases this is done under an API separate from the main (document) website. And there are lots of reasons why you might want to separate the two. Machines (or at least those manned by impolite developers) tend to hit sites more aggressively than the average punter. Separating out the API allows organisations to make access dependent on possession of an API key and API keys can be used to track usage and impose rate limits. All of this goes back to one important point: the business value of having web pages is proved, the business value of publishing the same stuff as data isn't (yet).

In all of this there's a very important distinction: web pages are seen as the things punters surf; APIs as platforms for development. And no-one wants a brittle platform. Sales forces and marketeers and business owners often enjoy the bragging rights of having an API. But it's not something they ever see. The shape of the API is entirely in the hands of the developers. No one outside the development team is ever asking for redesigns or marketing URIs or human readability. It's just obvious to all that machines need persistent identifiers.

The classic case is Twitter. If you surf the document web the URI of a tweet looks like::

http://twitter.com/#!/fantasticlife/status/7925332711571456

with the tweet identifier nested under the tweeter's username. But on Twitter users are able to change their username. Not many people do it but it has been known. So in the old style REST API the same tweet lives at:

http://api.twitter.com/1/statuses/show/7925332711571456.xml

(where 1 is just the API version.) The username isn't there because the username can change and changing URIs make APIs brittle. And as ever encoding resource structure into URIs is the enemy of persistence. Or cool URIs stay flat.

So it seems there's a clear pattern. The document web benefits from cool-ish URIs (give or take) whereas the API view can never allow the cool mask to slip. But...

The website as API

In the Linked Data world there's a heavy presumption that you use HTTP appropriately (read RESTfully). That the HTML views (desktop, mobile, tablet...) and the data views all get served from the same URI depending on what the HTTP request asks for.

In the original design notes for Linked Data there's no explicit mention of content negotiation. The only instruction is When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL). It doesn't explicitly say any URI and it doesn't explicitly denounce the separate API / API key pattern but it's all fairly implicit from the Browsable graphs section. You can't browse graphs if you have to keep stopping to register another API key.

From my experience of the Linked Data world there's lots of attention paid to how Non-Information Resources relate to Information Resources (303s and hashes and possibly 200s when the bickering dies down). But there's relatively little attention paid to the content negotiation of the IR. And even when there is it tends to be about conneg between RDF and desktop HTML ignoring mobiles and tablets and etc.

Nevertheless, the general presumption seems to be one URI for desktop HTML, mobile HTML, RDF XML, RDF n3, json etc. I think it was Jeni Tennison who coined the phrase your website is your API. And for what's it's worth I think website as API / website as platform is the only sane approach. It makes data views and cross platform support just a matter of some templates and some CSS and some Javascript. Which means there's less code (almost always a good thing) and it's cheaper to develop. And it's all just using web standards (in this case HTTP) as intended.

But the point I've been trying to make is that currently there's a greater (business if not user) tolerance for uncool URIs and lack of persistence on the document web than there is on the data web. And if the two worlds are collapsing (and given RDFa and microformats even without conneg they are) some of the best practice approaches to developing API URIs need to migrate into best practice approaches for developing document URIs. And some people who want to change document URIs for various reasons still need to be persuaded that persistence matters to punters as well as machines. And I don't think we're there yet.