fantasticlife's posterous http://smethur.st Most recent posts at fantasticlife's posterous posterous.com Tue, 20 Mar 2012 06:41:00 -0700 My PVR is "in the cloud" http://smethur.st/my-pvr-is-in-the-cloud http://smethur.st/my-pvr-is-in-the-cloud
in The Cloud
on somebody else's server

Because I'm lazy and can't be arsed leafing through tables of features I get my broadband off BT. And they throw in a BT Vision box. Which is basically Freeview with a hard disc to record to and an internet connection. It's ok. Mostly it works. About once a month the fan gets noisy and it panics and falls on its face and you need to turn it off and on again. Or, if it's a proper panic attack, turn it off, hold down the reset button and turn it on again.

But every so often the box has a complete breakdown and no amount of off / on / off / on / reset makes it feel better. A while back my BT Vision box went a bit hysterical and broke. So they sent round an engineer with a new one.

Obviously I lost all my recorded programmes (even the on / off / reset achieves that aim). Probably no great loss; some Midsomer Murders, some Gardener's World, some Great British Railway Journeys and my daughter's carefully curated collection of M.I. Highs. But I also figured I'd lost all my instructions to record.

Last Thursday morning I had a bit of a panic because I thought the new series of Gardener's World wouldn't be recorded. So I clicked through several pages of the god awful EPG, found Gardener's World and found it was already set up to record.

Because, I assume, the box phones home and record instructions are stored "in the cloud" and the whole thing is re-synched periodically. Which made me ponder three things:

  1. What else is the box recording and phoning home? What I watch? What I record? What I record and watch? What I record and fail to watch?
  2. How is that data used and by whom? Which reminded me of this vintage article from Wired on Sky's plans to serve personalised adverts based on TV attention data and my oft quoted quote from Clive Humby (emphasis mine):

    If I knew your whole transaction profile - restaurants, travel, fashion - that could be immensely powerful. You'd need a consent-based model, but you'd understand every aspect of a person's life. The credit-card data tells you how they live generally, the supermarket data tells you their motivations, the media data tells you how to talk to them. If you have those three things, you're in marketing nirvana.

  3. Who owns that data and how else could it be used? If it's mine then why shouldn't I be able to port it out and offer it to Sky in the hope of a money off deal and a more reliable box? Why shouldn't I be able to port it into Programme List and get broadcast reminders and links to VOD services? Or take it to Amazon and get DVD recommendations. Or go the other way and take my Amazon data and get programme recommendations? There's a lot of personal data floating around but it's all locked into proprietary systems and outside my control.

I'm not, despite appearances, a privacy zealot. I don't think absolute privacy is possible or desirable. From supermarket loyalty cards to Oyster cards to Facebook every day we trade some privacy for some convenience. My problem is when the terms of trade are so obfuscated that it's not possible to weigh what we gain against what we lose. In the BT Vision case if I'd been offered the option of exporting my record instructions off the box I might well have clicked yes. (I'd have been even more tempted to click yes if my recordings were stored off the box but that would probably break 10,000 copyright agreements.) But I wasn't given the option and knowing that data is out there but outside my reach is just frustrating.

I don't think I'm alone in this. Some recent research by the NoTube project found that most people were uncomfortable about their online TV viewing being recorded when they hadn't consented and couldn't control what happens to the data.

There's been a long running debate about privacy vs "publicness" which mostly seems to miss the point. The point being informed consent. Those on the "publicness" side tend to say that organisations harvesting user data is a price worth paying for "free" access to "open" publishing tools. And ignore that the data being harvested disappears into proprietary systems where it's impossible for users to extricate it or correct it. The exact opposite of openness. It's bad for consumers because they get locked into a single system. And it's bad for competitiveness because new businesses can't hope to compete with established players who monopolise the interest graphs. Wherever you see customer relationship management or user relationship management it can almost always be read as "lock-in".

Most people are used to the idea that when they use the web (in the browser open, clicking links sense) their actions are being reported and recorded. The fact that we seem no closer to solving informed consent and data portability on the web doesn't bode well for when our white goods start phoning home.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 16 Feb 2012 12:40:00 -0800 A rambling post about RiscOS. Ish http://smethur.st/a-rambling-post-about-riscos-ish http://smethur.st/a-rambling-post-about-riscos-ish

As a kid I never had a computer. In those pre-web days the main reason for owning one seemed to be the playing of games and I never really liked games. Still don't.

The first computer I ever met was a university mainframe. You'd spend a morning with a map chart and something that looked like an air hockey puck, tracing lines from red shifting stars or particles being accelerated or collided or some such. Then take the results and write some convoluted Fortran programme which would (hopefully) change the numbers you entered into some different set of numbers.

Except not before the code had been "compiled". I don't think anyone ever explained what "compilation" entailed. You spent several minutes copying your Fortran to a huge floppy disc, handed it over to a sullen lab assistant and they went away and did the water into wine thing. Which for more reasons unexplained took most of the afternoon. Since any time spent in the Blackett Laboratory was demoralising and depressing, compilation time was pub time.

My second meeting with a computer was a Mac Classic that sat in the corner of the same lab. You'd stagger back from the pub and (if you'd struck lucky and your code had compiled) take the new numbers it had made and type them into a spreadsheet type programme on the Mac and get back a pretty graph. I remember wondering why that Fortran computer thing couldn't just work like this Mac computer thing. It had the distinct advantage of working even when you were drunk; the mainframe thing barely worked if you were sober.

After university I went into the then thriving (cough) CD-ROM trade. It was mostly Mac "Power PCs" with a few Windows machines scattered about. But they felt like a different breed than the old Mac Classic. The CD-ROM industry inherited the Adobe bloat-wear of the 1980s desktop publishing industry that we're still stuck with today (Photoshop and Illustrator). And added a few of its own (Director and Authorware). The minute you started up one of these applications your Mac stopped being a general purpose computer. The chosen application ate all your memory for several hours before eventually overheating the machine and crashing. These were applications as operating systems; for the duration of use nothing else was possible.

After a little while I moved to glorious Norwich to make educational CD-ROMs. At that time most schools had graduated from the BBC Micro but kept faith with Acorn and invested in shiny new RISC PCs running RISC OS. So alongside the PCs and Macs we had RISC PCs. Which were desperately unfashionable. The only people who ever bought RISC PCs were schools so any RISC OS conference was entirely populated by geography teacher retreads with beards and elbow patches. In the room next door was Paul Mison (who kicked off this vague reminisce) and the first web people I ever met. They were Mac users to a (wo)man; our RISC PCs and CD-ROM burners were a badge of digital shame.

But secretly I quite liked my RISC PC. It was cute and simple and transparent. You could look inside it and tinker with it and work out what it was trying to do. It was far removed the shrink-sealed product of the modern Apple machine. And it didn't have dongles and bloatware. Instead of a packaged applications like Photoshop to meet all your photo editing needs it had an app for resizing photos, an app for recolouring, an app for cropping...

In some ways these were a bit like the modern app of the twonkPhone or twonkPad: single purpose applications designed to do one thing (fairly) well. But unlike app store apps they co-operated, they talked to one another. As my friend Tom might say, they were generative. You could drag and drop the output from one app as the input for another. You could write simple little scripts that chained together apps as a mini process. And because each component co-operated you never got stuck with vendor lock-in or forced upgrades. You could just grab a different app for the same purpose off a floppy disc and swap them in and out at will.

Which reminded me for some reason...

...of a pub chat with Mo and Faith about APIs and open data. In my head at least there is a connection here...

Most businesses of any size will have data spread across multiple systems (staff details, product inventories, room bookings, finances). At some point there's a realisation that scattering knowledge is inefficient and there'd be more value if the multitude of different system co-operated and exchanged information. For lots of organisations that path leads inevitably to the large scale enterprise architecture dreams of one consolidated system to rule them all. So the usual coterie of consultants march in to design the uber-system. It's the mainframe mentally that never quite died out. (I half remember another company I used to work for had seven systems, all called The Global Platform).

Eventually the uber system gets built and all the data from the legacy systems gets pumped in. And it fails. Because the data isn't quite the right shape and different systems use different identifiers and some systems use different fields to mean different things at different points in time... But mostly the enterprise architect dreams die because they ignore the real problem until it's too late. Designing data cathedrals is (relatively) easy; data and identifier consolidation is the hard part. And if you can solve identifier consolidation in the first place there's no need to ever build the cathedral. Which is why god gave us URIs, HTTP and APIs. And why...

...tl;dr everything should have an API

I realise there's nothing new in any of this. It's the usual, "small pieces, loosely joined" (though I'd quibble about the definition of loosely). And I realise that hanging this argument on nostalgia for the rotting corpse of a badly failed operating system weakens it somewhat :-/

But business systems should be more like RISC OS and less like mainframes or enterprise / Adobe bloat-wear. They should do one thing well and co-operate and communicate. An organisation should be its APIs. And everything should have an API.

If you're in the legal business every case should have an API, every lawyer should have an API, every citation of case law or legislation should have an API. If you're in the TV / radio business every studio should have an API (what's recording there, what was recorded there, what's planned), every camera should have an API (where is it, where has it been), every production should have an API (what stage is it at, what's the budget, who's the co-producer). If you're in the news business every camera crew should have an API (where are they, where have they been), every journalist should have an API (what have they submitted, where are they (though it's not unlikely that Twitter or Facebook or FourSquare already know this)). If a riot kicks off somewhere in Tottenham you should be able to query across systems to easily find the closest journalist, the closest camera crew, the closest radio car... without having to chase down six different spreadsheets across four different departments.

Whenever talk turns to APIs it's usually a side effect of already publishing to the web. The usual question is, "we've published this content to the open web, can we give it an API?" Which feels like the wrong question. If everything is / has an API the real question is, "Which bits of this can we open to the web and which bits are better kept private?" That's just a permissioning problem and permissioning is easy :-)

All of this is dependent on whether the intention is to create a more intelligent, connected, generative website. Which is not a bad goal. It just seems more ambitious to create a more intelligent, connected, generative business. And expose the bits you choose to expose to the world.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Wed, 18 Jan 2012 11:35:00 -0800 Stop testing the wrong things http://smethur.st/stop-testing-the-wrong-things http://smethur.st/stop-testing-the-wrong-things

Or at least start testing the right things.

Lots of chat about Test Driven Development and a brief flurry of tweets with @rarepleasures left my bicker button feeling unfulfilled so this is just another rant that wouldn't fit into 140...

Why I don't really like test driven development

  1. Because the minute you add a label to an approach, within a week it becomes a "process", within a month someone will organise a conference and within six months its just more dogma and doctrine. But that aside...
  2. There's a chain. At one end are the people somewhat pompously referred to as "the business". At the other end an assortment of developers and designers patronisingly referred to as "geeks" and "creatives". The people at "the business" end want to solve a problem; the people at the building stuff end generally help to solve problems. The more links in the chain, the more noise gets introduced until you end up with requirements and "user stories" as chinese whispers. Professionalising a class of people into business analysts and product managers doesn't stop chinese whispers being chinese whispers.
  3. As Dan North says in his What's in a story post:

    Usually, the business outcomes are too coarse-grained to be used to directly write software (where do you start coding when the outcome is "save 5% of my operating costs"?) so we need to define requirements at some intermediate level in order to get work done.

    The point being that by the time any of this stuff hits the designer / developer it's usually passed through the hands of several intermedaries and been reduced to some requirements / user stories. But requirements don't matter. They're just an abstraction to make it easier to start writing code. What matters are the "business" objectives. Or, without wanting to sound too New Labour, the "outcomes".

    The usual pattern is to explain the what to the developer / designer and leave the how to them. Which might be fine. But explaining the why is probably more important. Who knows, they might even have an opinion on the what. Stranger things have happened.

    Anyway, the more you separate developers and designers from the "why" the more we head back to the bad old days of waterfall, with the people doing the work sat at the end of the process being drip-fed user stories and expected to lay golden feature eggs.

  4. Requirements are fine as a starting point for code and using those requirements to generate tests for that code makes sense but you're only testing the code against the requirements. You're not testing the service / product / let's-just-call-it-a-website against business objectives and outcomes.
  5. Businesses have all kinds of ways of measuring performance. That's what the final slide of the boss people's presentation on "KPIs" is all about. And anything that can be measured can be tested. The main problem is they usually get measured six months after the fact.

    The objective might be to get more registered users; the requirement might be a simplified registration process and / or the ability to authenticate with 3rd party accounts. The objective might be less abandoned shopping carts; the requirement simplified checkout and / or one click purchase. You can measure any of these objectives / outcomes so you can test them. But software tests only test software against requirements and...

  6. ...code does not live in isolation. Until real code meets real data and real content and real copywriting and real design and real users with real needs (and probably a real marketing campaign) you can't measure the changes you make against real objectives.
  7. It's fine to have those screens in development corner that show regression tests passing and failing with green and red lights. But it would be good to see other screens showing real registration rate data, real close account rate data, real buy / play / consume button data, real abandoned shopping cart data, real inbound traffic from search engines or social media or whatever data.
  8. If you're measuring the impact of your work against real usage you can make tiny, tiny changes very, very quickly; isolate those changes from other changes in the system and see how they work for real people. Test code against requirements by all means but don't assume your tests tell you anything meaningful.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Fri, 18 Nov 2011 13:10:00 -0800 Data ghosts in the Facebook machine http://smethur.st/data-ghosts-in-the-facebook-machine http://smethur.st/data-ghosts-in-the-facebook-machine

This is partly an extended comment on Paul Clarke's excellent Accidental Data Controller post. And partly a whine that, even though we've been talking about social graphs, and very little else, for the last few years, we still don't really think in graph terms when it comes to our friendships. Or much else.

Paul's post is about the "find my friends by pillaging my address book" function that seems to ship with every social networking / commodity publishing website. And in particular about how Facebook stores contact data for people who've never registered with Facebook, the better to help them find their friends when / if they do. But best to read it.

The ghosts of the not yet born

Obviously I have no more knowledge of how Facebook model their data than the next data geek. But if I were evil then...

...say Alice registers on Facebook and consents to the pillage my address book function. Somewhere in that address book are contact details for Bob. Let's say email and mobile number. The first step is to check if there's a registered account in the system matching those details. If there is then Bob gets suggested to Alice as a possible friend. But if Bob isn't registered or is registered but hasn't supplied those details, Ghost Bob gets created:

1

If real Bob comes along later and registers or gives his email / mobile number to Facebook real Bob gets consolidated with Ghost Bob. But it doesn't necessarily stop there. Say Chris registers and also consents to the pillage. Chris isn't really a friend of Bob but they have worked together. So Chris' address book has a record for Bob with his email address and his work phone number. All of this is about finding points in data you can triangulate from. In this case it's the email address so Facebook's Ghost Bob now has email, mobile and work number:

2

Add in Dave who submits Bob's email address and postcode and Ghost Bob starts to accrete data like a velcro ball on a fluffy rug:

3

Then add in Edith and Fred and Gareth and Ghost Bob gets a lot less ghostly. He's just another person node in a huge graph of data; just a slightly less active one.

And the ghosts of the dead

It's been reported almost everywhere that Facebook's delete button is really more of a hide button. So the same thing works in reverse; leave Facebook and your data ghost lingers on. It would be interesting to know the figures for registered accounts vs the ghosts of the dead and the ghosts of the not yet born. I'm not on Facebook anymore but I'd happily bet that my data ghost still haunts the place.

Putting the ghosts to work

In theory the ghost people just sit there until a corresponding account is created / linked, at which point the suggested friendship schtick takes over. But even ghost people can be useful.

If you can infer that Alice knows Ghost Bob, Chris knows Ghost Bob and Dave knows Alice, Chris and Ghost Bob, then Alice has three indirect connections to Chris. One through Dave, one through Ghost Bob and one through Dave and Ghost Bob. Which increases the chances that Alice might know Chris. The more connections in the system the better you can predict other connections. And it really doesn't matter how many of those connections link to ghosts; the number of edges is more important than the quality of the nodes.

The social graph is not a different thing

Thinking in graph terms is hard. Thinking in social graph terms is even harder because our egos take over and we tend to picture ourselves at the centre of a spider's web of connections. To understand what's going on you need to step above, god-like and look down.

The other problem when thinking about the social graph is the tendency to see it as something separate. In page design terms it's usually the bit on the right of the "content" that looks like a bolted on afterthought. But switching examples to Twitter.

If Alice follows Bob and Bob follows Alice and Chris follows Bob and Dave follows Chris. And if Alice tweets and Bob retweets and Chris retweets and Dave favourites. And if Chris makes a list and Bob and Dave are both on that list and Alice follows that list. The whole thing is just some interwingled things and there's no content and no social graph; just a graph and some nodes and some edges. And some of the nodes are people.

It's not how big it is or even how you use it

Paul ends his post with a question:

how big does your address book have to be before you need to register it under the Data Protection Act?

I tried to leave an intelligent comment but accidentally added some angle brackets. So failed. What I wanted to say was: it doesn't matter how big the data set is or even how you (intended) to use it. The only thing that matters is how interwingled it is. Divide your edges (relationships) by your nodes (things) and you might be on to something...

Why is any of this a problem?

Mostly it isn't. Everyday in every way we trade privacy for convenience. Own a mobile phone or a sat nav or a connected set top box or a supermarket loyalty card and you're trading some privacy for some convenience. The trouble is it's never quite clear what the trade-off is. (Maybe we just need the equivalent of a nutritional information label for privacy / convenience?)

But most of the debates about online privacy aren't really about privacy at all. They're about informed consent and how we make the decision to make the privacy / convenience trade. Most of the convenience benefits are seen best from inside the graph. And most of the privacy invasion is only apparent when you step outside and look down. Which makes things tricky.

Being informed enough to give consent is difficult enough for most people. If you're Ghost Bob you were never even given the opportunity. You never signed up for the service or ticked the crappy little "I've read the Ts and Cs" checkbox. You're just an accidental node in some parasite's recommendation engine.

Massively interconnected data is dangerous when some of the nodes are people. When some of the nodes are ghost people it's just unethical.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Mon, 10 Oct 2011 04:34:00 -0700 Making things with BBC data? http://smethur.st/making-things-with-bbc-data http://smethur.st/making-things-with-bbc-data

If you've ever made anything using BBC data from /programmes, /music, /nature or anywhere else...

...from a quick Twitter hack to a full blown EU project...

...and if you work for the BBC or anywhere else...

Please could you add a comment below with a quick outline?

Duncan, realise this could take some time but you're not excused :-)

Thanks

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Wed, 28 Sep 2011 14:01:00 -0700 Amazon and the reintermediation of the spectacle http://smethur.st/amazon-and-the-reintermediation-of-the-specta http://smethur.st/amazon-and-the-reintermediation-of-the-specta

There's an archetypal narrative of the web that starts with the word disintermediation. Which is the posh way of saying cutting out the supply chain middle men.

It's usually accompanied by a picture showing before:

Before

and after:

After

The perenial poster child for the promise of disintermediation is Dell; cutting out the distributors and retailers to go direct to consumers with their marvelous black boxes of technology.

Like most other things in the realm of new technology people tend to take this picture and extrapolate. So many industries have so many middle men it's tempting to imagine a world without. In the creative / spectacular industries in particular there was a promise from the early years of blogging that content producers could go directly to consumers and cut out countless layers of intermediation en route. Music artists wouldn't need record labels, authors wouldn't need publishers, journalists wouldn't need newspapers. And all this would play out on an open web tied together with links, search and micropayments.

But it hasn't really played out that way. Since the last dot com boom and bust the narrative has changed from disintermediation to reintermediation. And from there to the consolidation of reintermediators.

Apple reintermediated the music industry, Sky reintermediated the TV industry, Netflix threatens to reintermediate the cable TV industry, Etsy reintermediated the local craft market, Ebay reintermediated the jumble sale and Facebook reintermediated our friendships.

And for everything else there's Amazon...

...who've managed to reintermediate everything from the book industry to large parts of the film industry to the small power tools industry and onwards.

The weird thing about Amazon is they're always there but rarely noticed. The rest of the tech community is happy to stand on any platform available to shout their own praises. And the tech press follow along like adoring puppies. Every time Ed or Ev or whatever his name is announces a new feature the web lights up. Usually with adoration. It only takes an announcement of an announcement by Zuckerberg to trigger a torrent of praise / condemnation. And every time Google make a public proclamation... well at least Jeff Jarvis gets excited. Meanwhile Amazon just get their heads down and build the best online store and the best engineering platform and the best integrated service design and there's barely a murmur. It's all vaguely weird...

Something else that's weird. When the industry gets together for a spot of communal backslapping the integrated service design prize always heads towards Apple. Which makes me wonder if the people handing out these prizes have ever tried to use iTunes? In general it's pretty horrible. But compared to Amazon's Whispernet / Whispersync and one click purchase it's a real turd of a system. In the time it takes to boot up iTunes you can grab a Kindle, search for a book, find a book, buy a book and start reading it. When it comes to integration of web storage, web services, software services and physical devices Amazon make Apple look like amateurs. And the rest of us are barely trying.

That said Amazon aren't exactly shy about their role as a reintermediator. The self-publishing upload form cuts out the agent, the publisher, the distributor, the wholesaler and the retailer in one swoop. The Author Central and 'ask the author' features plug readers directly into authors, again cutting out the middle men.

But back to the point...

So why the new middle men?

Or why didn't the promise of producer to consumer work out? Wikipedia says:

Reintermediation occurred due to many new problems associated with the e-commerce disintermediation concept, largely centered on the issues associated with the direct-to-consumers model. The high cost of shipping many small orders, massive customer service issues, and confronting the wrath of disintermediated retailers and supply channel partners all presented real obstacles. Huge resources are required to accommodate presales and postsales issues of individual consumers. Before disintermediation, supply chain middlemen acted as salespeople for the producers. Without them, the producer itself would have to handle procuring those customers. Selling online has its own associated costs: developing quality websites, maintaining product information, and marketing expenses all add up. Finally, limiting a product's availability to Internet channels forces the producer to compete with the rest of the Internet for customers' attention, a space that is becoming increasingly crowded over time.

Which is probably true but I suspect it's more than that. Recently there's been a flurry of blog posts all saying that some random industry is "becoming software". It all kicked off when The Wall Street Journal published:

There's a premise that what sets successful businesses apart is their readiness to adapt to a software driven world. I'm not so sure. I've been around enough software for long enough to think most of it is just a patchwork of bug fixes for obscure corner cases and a set of features that no-one can quite remember requesting. Actually, that's not quite true. Software is what we write to extract information from data. The worse your data model is, the more software you have to write. The optimum line count for code is zero. But anyway, software without data is about as much use as a pub without beer. And I'm firmly of the opinion that:

Re-read Why software is eating the world and substitute every occurance of the word software for the word data. I swear it makes more sense.

The attention graph

In my head there's a picture that looks something like:

Graphs

Our friends in social media world tend to worry most about:

Social-graph

Enterprise architects, IAs, archivists, taxonomists etc tend to worry most about:

Content-graph

But the one thing all the usual web whatever-number-we're-up-to suspects really get right is:

Attention-graph

Facebook's creepy Open Graph protocol and "frictionless sharing" are just an attempt to own the attention graph no matter where its users are paying attention. The read / write web exists; you read something, Facebook write it to their database.

In my day job I've heard this described as "having a personalisation strategy". Which completely misses the point. Personalisation is the bait, customer relationship is the trap.

Anyway, Amazon take the exploitation of attention data to new levels. Their social graph is minimal and their content graph barely exists. Browse Amazon.wherever-you-are and the majority of the content is contributed by customers. And the majority of the context / navigation is contributed by customers. Web services inviting contributions by users tend to have standard boiler plate terms and conditions that at least pay lip service to the contributors rights over their material:

You or the owner of the content still own the copyright in the content sent to us, but by submitting content to us, you are granting us an unconditional, irrevocable, non-exclusive, royalty-free, fully transferable, perpetual worldwide licence to use, publish or transmit, or to authorise third-parties to use, publish or transmit your content in any format and on any platform, either now known or hereinafter invented.

But Amazon don't even bother with the lip service:

If you do post content or submit material, and unless we indicate otherwise, you (a) grant Amazon.co.uk and its affiliates a non-exclusive, royalty-free and fully sublicensable rights to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, and display such content throughout the world in any media; and (b) Amazon.co.uk and its affiliates and sublicensees the right to use the name that you submit in connection with such content, if they choose. You agree that the rights you grant above are irrevocable during the entire period of protection of your intellectual property rights associated with such content and material. You agree to waive your right to be identified as the author of such content and your right to object to derogatory treatment of such content. You agree to perform all further acts necessary to perfect any of the above rights granted by you to Amazon.co.uk, including the execution of deeds and documents, at the request of Amazon.co.uk.

What you contribute to Amazon belongs to Amazon.

Pervasive computing, ubiquitous surveillance

By now we've probably all sat through conference talks on the "internet of things" and pervasive / ubiquitous computing. Past the point where people stop talking about making rabbits twitch their ears when someone tweets about carrots, I think the Kindle is the first real world example of any of this. So it's still got a screen but it doesn't feel like a computer. It's a reading device, a retail terminal and a beautifully designed back channel.

Read the Whispersync marketing foo-foo and the public messaging is all about seamless synching between devices. So you put down your Kindle and open the Kindle app on your iPad and your book is miraculously open at the page you were reading. Obviously there's no device to device synching going on here. It's device to web service to device. So all that data gets phoned home to Amazon.

Amazon already know the books you've bought, the books you've browsed, the books that other people who've bought similar books to you have bought, the books you've rated, the books you've listed, the books you've reviewed. Now if you're using a Kindle with Whispersync they also know if you're a slow reader. They know if you're the kind of person who buys books and never makes it to the end. They know the books you've bought that you skim through in an afternoon. They know the books you read slowly, flicking back though pages to check facts.

Again, this kind of integrated end-to-end service design is interesting to compare with the supposed masters of this sort of stuff. Apple managed to build the iTunes store, some clumsy iTunes software and the rather lovely iPod. But they allowed the backchannel data to get intermediated by Last.fm / Audioscrobbler. You just can't imagine Amazon letting Good Reads or Open Bookmarks build a business by tapping into the Kindle backchannel. They just seem to understand that connected technology isn't just good for distributing content outwards; it's also rather well adapted to reporting back on the usage of that content.

I'm probably gonna get flamed if I've got my facts wrong here but I've searched long and hard for Kindle / Whispersync terms and conditions about user contributed content / data and they just don't seem to exist. So I'm assuming that Amazon terms and conditions cover Kindle Whispersync too. In which case all your Kindle reading data, your bookmarks and your margin notes belong to Amazon too.

I can't help but wonder what it would be like to hack with that kind of data. What could you build around community reading groups, formal education, adult literacy? At the very least it would save me the chore of ticking homework diaries. But I doubt we'll get that chance.

Facebook regularly take a beating for pushing privacy issues to breaking point. I'm not sure what the difference is between Facebook snooping on your reading and Amazon snooping on your reading except the commonly reported privacy issues around Facebook are all about what gets reflected back onto the web. As opposed to what gets absorbed by Amazon.

The usual answer to all of these privacy worries is, what's the worst that could happen? I get some better adverts to watch. Which fits firmly in the, "nothing to hide, nothing to fear" bucket. And it's equally nonsense. It's not like there's not past evidence of data being collected for perfectly innocent purposes being used for something altogether different.

The obligatory user experience bit

Since most of this blog is about user experience in one form or another I'm feeling duty bound to pitch in with: forget about HTML5 and CSS3 and jQuery and responsive design and "mobile first". The most important  issue for user experience people to grapple with is informed consent. More and more web services are dependent on user contributed content and data. Every time you make a contribution (explicit or implicit) you're trading convenience for privacy. This isn't necessarily a bad thing; it's something we do everyday in real life from mobile phones to loyalty cards. But as the web moves out of the browser and into smart objects, the trade-offs we're making need to be made explicit so people can make informed choices about when to get involved and when to back away.

There's another personal bugbear in all of this. We're increasing told that companies fail when "UX professionals" don't have a C*O seat at the top table. The company would make better products that more people would want to own and use; everybody wins. Again the poster child is Apple and the pin up boy Jonathan Ive. But in the world of constantly connected, reporting devices your employer's interests are not necessarily the same as those of your users. There's a more general question here: as the supposed representatives of "the user" who exactly are we working for?

A dystopian summary

It feels like we've long since lost the desire for an open, disintermediating web. The old power brokers of publishers, record companies and film studios are dead or dying. But we're all more than willing to trade our privacy for the convenience offered by the new intermediators.

It's not like you can just ignore what's going on. Owning your own domain isn't enough anymore. If you're a newspaper and you don't engage on Facebook you'll just miss a massive chunk of your audience. If you're a broadcaster and you don't promote your programmes on YouTube you'll miss a huge chunk of your audience. If you're a publisher and you don't go through Amazon you might as well give up and go home.

But all the while you're being cut out of the deal. Signing up to any of the new intermediators isn't about opening up new distribution channels; it's about outsourcing your customer relationship. And attempting to fight the reintermediators on your own turf only works if you imagine that customers would get more value out a relationship with Penguin than they would out of a relationship with Amazon.

I'm not sure what happens next. Maybe the web is just going through a period of reintermediation and consolidation. More optimistic voices than mine would point to things like Disapora, data portability and personal data stores and say the web will return to a more open, distributed model. But for all the talk of the web as a distributed system it kinda isn't. It's some servers and some clients. Querying distributed data sources to build the kind of "experiences" that people have got used to isn't close to happening. Even in the occasionally rarified world of Linked Data, reintermediation is happening in the form of consolidated data stores / market places like Kasabi. Because having all that data out there isn't enough. It has to be in one place to make it useful to query.

I have nothing but admiration for Amazon. They make the best shop. They make the best services. They make the best software. They have the best data. But they scare me. If I was a book publisher they'd terrify me. And if I was a film studio or broadcaster I'd be watching my back. Especially since they also own LoveFilm and IMDB. And especially when they've just announced Whispersync for movies and TV. And even if I was Google or Facebook or Apple I'd be concerned. Because Amazon are light years ahead of the game we think we're playing.

Right now the bits of the web that aren't Amazon or Google or Apple or Facebook feel like the high street greengrocer waiting for the supermarket to ride into town. Which depresses me because it's the exact opposite of what got me interested in the web in the first place. Maybe there's still room for a chi-chi artisan cheese shop or two. But who the hell wants to run an artisan cheese shop?

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 18 Aug 2011 03:41:00 -0700 Storytellin' http://smethur.st/storytellin http://smethur.st/storytellin

Round where I live it's almost impossible to walk past a powerpoint session without coming across the "Storytelling" word. Bonus points get scored for managing to get "second screen" and "transmedia" in the same "deck" but we won't go there. It all leads to lots of talk around what storytelling means, how it's done and how it relates to the web. To date the main experiment has been building the Mythology Engine and injecting some Doctor Who storylines into it to make journeys between characters and events from the new and 'classic' Doctor Who. What follows are some fairly random thoughts on how you'd go about modelling stories in order to tell them on the web. Since I'm an expert in neither RDF modelling nor critical theory it might all be nonsense.

 

Events

The obvious starting point for modelling stories is the event. Things happen in stories; capture those things and you have the basic building blocks of story telling. So we've used Yves' Event Ontology to capture events (real and fictional), the time they occured, the place they occured and the people and things involved. The next obvious step is to say a story has many events but also different people might tell different stories around the same event(s). So stories have many events and events have many stories. Which in relational database terms means a many-to-many and a many-to-many tends to suggests a missing concept. In this case the missing concept is narrative order, allowing a story to reveal events out of the sequence in which they happened. Which is useful if you're trying to describe a non-linear narrative with flashbacks and various recollections of nested narrators (think Wuthering Heights). So you end up with something like:

Events_stories

As a simple example take two of my favourite TV programmes: Columbo and Midsomer Murders. They have the same basic event structure which looks roughly (give or take a murder) like:

Events

But they're told very differently. Columbo almost always tells it straight, in event order: first establishing the characters (murderer and murderee), then revealing the motive, the means, the murder and onwards. Right from the start you know who did it, why and how. For the audience the game is all about guessing how Columbo will come good and catch them.

Midsomer Murders is a more standard whodunnit, told out of sequence using the usual techniques of recollection and flashback. It often opens with the murder scene followed by the investigation. The investigation turns up various clues on route; some real, some red herrings. The motive and means are only fully revealed as part of the post-investigation accusation. (Which, as a complete aside, is not a disimilar narrative structure to The Apprentice: Sir Alan as detective in a murder mystery, the country house replaced by a rented office in Docklands.)

Reordered_events

Assertions

This basic model works fine if all stories that agree on an event also agree on all the assertions made about that event: when, where, who and what. But imagining that the event being described is a crime, everyone might agree the crime took place but Alice might say that Bob was present and Bob might not agree.

None of this potential for disputed assertion (whether when, where, who or what) is covered by the stories as ordered events model. But in my mind at least stories are more an ordered set of assertions than a reordered set of events.

Scenes

So the reordered events model for Midsomer Murders shown above is clearly not correct. Midsomer Murders does often start with a scene from the murder event but whilst the murderee and maybe the location are depicted the murderer is kept out of shot. Over the course of the programme subsequent scenes often return to the murder event progressively revealing more detail. It's this split between events and scenes that the 'stories as reordered events model' doesn't give you.

Every medium has a bag of tricks that allows story tellers to control what's revealed when. In TV and film it's usually close up, over the shoulder shots filmed with low light levels (the shower scene in Psycho). Columbo's interesting for comparison because it's not a whodunnit. The murder is usually filmed as a well-lit wide shot with every detail (location, time, murderer, murderee, weapon...) made explicit.

The closest comparison I can think of to the bag of assertions model is the RDF named graph. And I'm not saying that to be all linked data-ish; I just can't think of a way you'd do this in any other data store. Named graphs allow you to bundle up a set of statements / assertions / claims (in this case RDF triples) and associate them with some provenance: person X stated these things:

Namedgraph

The named graph model only gets you as far as some collections of assertions. But stories are more than just bags of assertions: in order to 'tell' them you need to be able to control how those assertions are revealed to the reader. In this case it's the scene (maybe that should be act?) that reveals a particular named graph's bag of assertions:

Scenes

Event interwingling

The model so far allows you to bundle up and progressively reveal assertions around events. But it doesn't allow for assertions about the relationships between events: event A directly caused event B; A was one factor in B happening; A didn't cause B, but without A, B couldn't have happened etc. For me these assertions are the most important thing about storytelling because they speak to the reason we tell stories in the first place: an attempt to understand and explain why things happen. They also speak to the inner child's cry of "why?" (and the inner adults response of "because"). Every story we tell is one long chain of "cause" and "effect", why and because. Who and where and when matter but why trumps them all.

In news storytelling in particular, why and because are the central pillars of decent journalism. Why is my local library closing? Because of council cutbacks. Why are the council cutting back? Because of central government cutbacks. Why are central government cutting back? Because they need to balance the national budget? Why does the budget need to be balanced? Because the previous government borrowed too much? Why did they borrow too much? Because the banks collapsed. Why did the banks collapse? Because mankind is sinful and the bankers weren't washed in the blood of Christ...

Almost all journalism (all the examples I can think of anyway) follow this pattern of chaining events together with a sequence of becauses. Sometime the because is explicit, sometimes implied, sometimes insinuated but it's almost always there. And it's usually where the majority of disputes arise. Even where they agree on all other details, The Guardian's chain of causality is going to look very different to the Daily Mail's and every claim in the cause and effect chain could be and will be disputed by someone. The ability to see how claims of "causality" differ between different journalists and different news organisations would be a handy tool for general media literacy.

As an aside I think this is my main misgiving about the rNews spec. It models online news article publishing; it doesn't model news or journalism. No events, no claims of event <> event causality, no why, no because. To steal a line from Tom Scott news stories [are] metadata about real world events.. And to steal a line from Jeff Jarvis articles are the byproducts of journalism. Which makes rNews meta-metadata or the byproduct of a byproduct.

Anyway, that was a long aside to add one more line to the model: Alan was arrested for the murder of Joyce.

Causality

Stories and discourse

From the diagram above it seems like stories operate on two basic levels: the assertions they contain (the story) and the way in which those assertions are revealed (the telling). At this point I went off in search of better labels for these levels. I'd thought that some of story, narrative and plot might apply here but all the definitions seem a little fuzzy being both event (rather than assertion) centric and using account to cover a multitude of "telling" possibilities. At least according to the OED:

Story
an account of imaginary or real people and events told for entertainment
an account of past events, experiences, etc
Narrative
spoken or written account of connected events; a story
Plot
the main sequence of events in a play, novel or film

A chat with Matthew sent me in the direction of Roland Barthes' Introduction to the Structural Analysis of Narratives, an essay collected in Image Music Text [PDF - page 76 (page 79 of the book)] which says:

Tzvetan Todorov [..] proposes working on two major levels, themselves subdivided: story (the argument), comprising a logic of actions and a 'syntax' of characters, and discourse, comprising the tenses, aspects and modes of the narrative.

Which gives two useful labels, ending up with something roughly like:

Story-discourse

Down the structuralist rabbit hole

From my (probably simplistic) reading of Barthes his main point seems to be that discourse can be analysed and deconstructed in much the same way that linguistics deconstructs the sentence. The major premise being:

[A narrative] shares with other narratives a common structure which is open to analysis, no matter how much patience its formulation requires.

Once this structure is identified:

[The] 'art' of the storyteller, [..] is the ability to generate narratives (messages) from the structure (the code). This art corresponds to the notion of performance in Chomsky and is far removed from the 'genius' of the author, romantically conceived as some barely explicable personal secret [..] it is impossible to combine (to produce) a narrative without reference to an implicit system of units and rules.

Barthes proposes that narratives operate over a set of hierarchical levels in much the same way as linguistics describes the sentence as operating at multiple levels:

To understand a narrative is not merely to follow the unfolding of the story, it is also to recognize its construction in 'storeys', to project the horizontal concatenations of the narrative 'thread' on to an implicitly vertical axis; to read (to listen to) a narrative is not merely to move from one word to the next, it is also to move from one level to the next.

That said, Barthes doesn't identify the precise levels of narrative but he does propose:

to distinguish three levels of description in the narrative work: the level of 'functions' (in the sense this word has in Propp and Bremond), the level of 'actions' (in the sense this word has in Greimas when he talks of characters as actants) and the level of 'narration' (which is roughly the level of 'discourse' in Todorov).

If you choose to believe Barthes then the story level shown above breaks down into two parts: Propp style functions and 'actions'. Which seems to fit with the event part of the model although I have no idea how you'd model 'charcters', let alone 'characters as actants'. And life's too short to read Greimas. If you choose to believe Propp then capturing the functions seems trivial, every event sub-classes some more archetypal event / function.

But more interesting is Barthes' description of the way narrative levels interact:

Narrative thus appears as a succession of tightly interlocking mediate and immediate elements; dystaxia determines a 'horizontal' reading, while integration superimposes a 'vertical' reading: there is a sort of structural 'limping', an incessant play of potentials whose varying falls give the narrative its dynamism or energy.

these levels are in a hierarchical relationship with one another, for, while all have their own units and correlations [..] no level on its own can produce meaning. A unit belonging to a particular level only takes on meaning if it can be integrated in a higher level. The theory of levels gives two types of relations: distributional (if the relations are situated on the same level) and integrational (if they are grasped from one level to the next);

All of the examples given in the book are based in literature but thinking about film (and TV) for a minute, there's lots of obvious examples of integrational relationships between the story and discourse levels: again the background "music" in the Psycho shower scene, the cymbal crash at the end of a pratfall. When it comes to "telling" a story there are all kinds of claims made on the discourse level about things on the story level. Every decision on script, casting, costumes, locations, props, sound effects, background music, lighting, camera angles, editing, maybe even film stock is a claim made in the discourse level about objects in the story level.

Any attempt to capture the relationships between discourse and story (beyond "reveals") turns the simple model shown above to spaghetti. But storytelling is as much about how things are revealed as it is about when they're revealed. There are techniques that could probably be identified but how you'd model that I have no idea.

An attempt at a conclusion

I think it's possible (although the presence of named graphs makes it tricky) to model the mechanics of a story (the ordered revealing of claims around events). And the "cause and effect claims" still feel like the most important part (especially for news and history) because they reflect how we attempt to understand the world.

But a model of the mechanics of a story doesn't really get you any closer to being able to tell a story using that model. I think it would be good for news organisations to share identifiers for events and people and places. I think it would be good for journalism if claims of causality were made explicit rather than insinuated. (I'm thinking of the Tottenham / London / England riots and the varying claims of causality.) But I don't think it gets us any closer to "web native storytelling". Whatever that might be.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Tue, 16 Aug 2011 15:11:00 -0700 If I worked for a big media organisation... http://smethur.st/if-i-worked-for-a-big-media-organisation http://smethur.st/if-i-worked-for-a-big-media-organisation

...(or at least one whose content could reasonably end up encoded as an mp3 / mp4) one thing I'd definitely like to see is an ID3 tag dedicated to holding a RESTful HTTP URI.

ID3 tags are designed to allow people to embed metadata about the content of a media file into the file. Although designed can seem quite a strong word in this context. A quick glance at the ID3 spec gives the impression that it was more thrown together. New tags have accreted over time with little discernible rhyme or reason. What started as an attempt to add core metadata like track title, artist name and release title to music tracks has bloated to a spec with a quite ridiculous number of tags.

But there are still two important attributes missing from ID3:

  1. A stable, persistent identifier for the content of the file
  2. A way to get more information about the content of the file

Actually ID3 does make provision for a Unique file identifier but it goes on to disclaim responsibility with:

This frame's purpose is to be able to identify the audio file in a database, that may provide more information relevant to the content. Since standardisation of such a database is beyond this document, all UFID frames begin with an 'owner identifier' field. It is a null-terminated string with a URL [URL] containing an email address, or a link to a location where an email address can be found, that belongs to the organisation responsible for this specific database implementation. Questions regarding the database should be sent to the indicated email address.

Eh? Really? Who on earth would populate an ID3 tag with the email address of a database owner? And why?

Both gaps could be filled by the addition of a ID3 tag dedicated to storing a RESTful HTTP URI. Settling on a stable URI gives a stable globally-unique identifier. And because it's an HTTP URI you can dereference it to get back more information. And if that information is returned as Linked Data you can follow your nose to more information and etc. In short the URI should employ content negotiation so if it's requested by a browser the user should get back an appropriate human readable webpage. And if the user requests JSON or RDF or CSV then the URI should return JSON or RDF or CSV. And if the user requests the media itself (audio/mp3 eg) they should get back the media file if it's still available.

The basic problem with ID3 is however much the spec expands and however many tags get added there's always going to be more that people want to say about a music track or a film or a TV programme. Trying to encapsulate all this descriptive power in a pre-defined set of tags is always going to be way too limiting. Or why embed metadata as tags when you could embed one HTTP URI and just dereference that to get the data? Metadata embedding is a silly solution to a hard problem.

Taking music as an example, you could embed an artist name, track title, release title and record label in the file. But adding a MusicBrainz URI makes all this core data available over HTTP. And adding a MusicBrainz URI makes additional data that could never be encoded in ID3 (like band membership (and data about those members)) available too. Because both MusicBrainz and BBC Music are published as Linked Data you can traverse the web to get BBC News stories for that artist, BBC reviews for that artist and BBC programmes that play that artist. Because The Guardian uses MusicBrainz identifiers in their new music site you can get Guardian reviews and news stories about that artist. And because the Echonest uses MusicBrainz identifiers you can get recommendations for similar artists.

Taking a BBC programme example, if ID3 allowed for an HTTP URI, that tag could be populated by a RESTful /programmes URI. Dereference that and you'd get not only core episode data (title, the programme it belongs to, the series it belongs to, broadcast information, contributor information, clips) but also music played in that episode (again linked to MusicBrainz), trackbacks to blog posts about the episode, products for sale including that episode, recipes in that episode. The list probably isn't endless but it's more than ID3 could ever scale to.

Most importantly for content publishers one of the many things you could get back is recommendations for similar (legally available) content. If there's a recognition that content will "travel", the benefits of "upselling" to legality feels like an obvious response. So punters get better, more expansive metadata, better services and opportunities to explore new content. And publishers get an opportunity to tempt people back to legality. And if it doesn't completely solve the provenance problem at least it's a step in the appropriate direction.

All it takes (and I'm probably simplifying through ignorance) is for media companies to mint HTTP URIs for their content which return liberally licenced (meta)data in standard, non-proprietary formats and link out to other data sources. And an ID3 tag to embed these URIs into files. And for people to build smart media clients that suck in this data to make interesting and useful experiences.

In the meantime, as Mo has pointed out, there are ID3 tags designed to hold URLs. WOAF (Official audio file webpage) and WOAS (Official audio source webpage) are obvious candidates for overloading if anyone fancies a hack. But even the use of the word "webpage" suggests they weren't designed for RESTful HTTP URIs.

So, in summary, if I worked for a big media company i'd be putting in the effort to ensure both my website and ID3 were Linked Data compliant.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Wed, 10 Aug 2011 02:38:00 -0700 One from the archive: the /programmes manifesto http://smethur.st/one-from-the-archive-the-programmes-manifesto http://smethur.st/one-from-the-archive-the-programmes-manifesto

Not so hot on the heals of Tom Scott's development manifesto for the BBC Nature site I thought I'd dig out the old BBC Programmes (@programmes) manifesto. It took a while to track down but eventually turned up in a dusty folder with the title dogma.html...

The timestamp says 14/10/2008 but I think it existed as some post-it notes on a wall several months before that. I know it predated Yves' arrival, so also predated any of the Linked Data work. Which was really just a logical extension rather than any new principles. Anyway, here it is:

/programmes believes:

  1. in one web
  2. in accessibility for people
  3. in accessibility for machines
  4. it's a service, not a product
  5. in designing from the domain model up, not the interface down
  6. in being RESTful
  7. in open standards
  8. in open data
  9. in linked data
  10. in fixing the data, not hacking the code
  11. in links before pages
  12. that the real value is in the links to other domains
  13. in designing for the browser in the browser

Like Tom's list we didn't always live up to these standards but they kept us (mainly) honest. I seem to remember we also kept a 'hack log' to keep track of anywhere we evaded our principles for the sake of expedience. Wonder what happened to that?

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Tue, 14 Jun 2011 04:21:00 -0700 Impolite personalisation - impotent in the face of inference http://smethur.st/impolite-personalisation-impotent-in-the-face http://smethur.st/impolite-personalisation-impotent-in-the-face

I'm not a massive fan of cars. Never having learnt to drive, their design pretty much passes by. But a wife working at BBC Magazines means a bathroom floor covered with old copies of Top Gear. Yesterday I came across a review of the Audi A1 (I'd link but topgear.com just returns a 500) which said:

Even its stop/start system is behind the Mini's - it keeps finding reasons not to stop at all. Not that it gives you its excuses, so I don't know how I can alter my driving style to make it more active. Too warm? It's not summer yet. Too cold? The coming of spring made no difference. Battery low? Shouldn't be. Aircon or heater on? Nope, have been careful to avoid that. Lights on? That changes nothing.

This struck a cord with a conversation going on on Twitter about personalisation and personalised recommendation. Which had been triggered by an Eli Pariser article in The Guardian which said, roughly:

the increasing personalisation of information [..] threatens to limit our access to information and enclose us in a self-reinforcing world view.

The opposing view was taken in a post by Better the Mask saying, roughly:

A lot of this article, I think, reads like a digital complement to the Reithian view on broadcasting - that it should be public service, give people what they need not what they want. High-minded, certainly, and noble in a certain light, but also highly problematic. Who decides what "we" as a community need?

Much of the debate seemed to centre on the usual paternalist reading of Reith with "low culture" as the sugar to make the "high culture" pill go down. I'm not sure that's entirely accurate. I don't remember ever seeing "inform, educate and entertain" rendered with bolds or italics. And as Tony Ageh might say, scheduling Top of the Pops next to Panorama was as much about exposing Top of the Pops to Panorama viewers as it was about exposing Panorama to Top of the Pops viewers.

I'd probably go further and say any attempt to break down culture into high and low is itself paternalistic and just leads to the usual sneering at the poor old Daily Mail reader. It also ignores the connections between things. It's usually not that many skips of the graph from "low" to "high"; there are no continents in culture.

And from a personalised recommendation perspective all the anecdotal evidence of user testing I've seen seems to suggest that people value recommendations outside of their bubble. Obviously that doesn't mean recommending Bells on Sunday to Westwood fans (or vice versa). But neither does it mean recommending Casualty from Holby City. People like to be surprised by recommendations, not locked into content ghettos.

All that said, there is one thing that bothers me about "personalised" content services. Recommendation engines take a large graph of data and compress it into a smaller set of one to many recommendations; compression for recommendation is just some inference over a data set to reduce too much choice to some choice. For personalised recommendation, part of the original graph is the user's past activity. There's some truth in the adage that, if you don't know your past, you don't know your future (who am I to disagree with Chuck D) and basing recommendations for future behaviour on observed past behaviour makes some sense.

The problems come when some system starts making inferences and you have no idea why. Like the Audi A1 stop/start system if you can't tell why a system is making some assumption you can't tweak your behaviour to change those assumptions and the whole thing just becomes frustrating. For recommendation engines the metric of measurement tends to be about what is returned. But for a useful and usable system why is equally important. And too often why becomes a black box with the intercession of magic. A polite, useful system would explain the assumptions it's making and the logical leaps it's taking. And allow you to help it to help you.

So given a standard e-commerce application I might be recommended products on the basis of products I've bought in the past. Which might work until the point that somebody else uses my account to buy things. At which point I start getting recommendations for things I have no interest in. Same deal for a TV recommender based on my past consumption.

All this is fine if I can see and modify any data that's been collected about me; so long as I can tell the system, "no, I didn't buy or watch that so please stop recommending me stuff on the basis that I did." Or, "yes, I did watch that, but my tastes have changed / it was crap."

But too often the data collected about me is hidden from view and when it is exposed I can't change it. But this is probably just me banging on about #userowneddata again...

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 19 May 2011 04:37:00 -0700 A comment for /node/114 http://smethur.st/a-comment-for-node114 http://smethur.st/a-comment-for-node114

This is a long, long delayed comment to Jeni Tennison's Opaque URIs != Unreadable URIs post from back in 2009. I think I meant to comment at the time but forgot and now comments are closed so doing it here instead...

Best start by saying I agree with Jeni but think the post might be subject to misinterpretation and, in my view at least, some of the commenters have misinterpreted.

First off Jeni separates out URI opacity from URI readability. She points out that URI opacity is already claimed as a term in web architecture, which states that, web applications must not try to pick apart URIs in order to work out information from them, or, roughly, machines shouldn't guess. For discussions about the readability of URIs she suggests replacing 'opaque' with 'obfuscated', which makes some sense to me. Certainly if the label opaque is already used to identify one concept reusing for another will probably just cause confusion. And because this is all about labels in context as identifiers...

I think it's probably also useful to separate out human readable from human meaningful. In my mind human readable at least implies natural language. Or at least that's the way it tends to be used in conversations around this topic. In contrast W1A 1AA is not natural language (it's not in a dictionary) but it is human meaningful (especially to a certain generation). But that meaning doesn't stretch that far outside the context of the UK.

Anyway, Jeni goes on to discuss possible URI designs for a school and suggests 3 possible options:

  1. the name of the school
  2. the unique reference number for the school
  3. the record number for the school in the database that is being published on the web

The first option is, I think, what most people mean when they say human readable as opposed to opaque or obfuscated. Jeni dismisses it by pointing out that school names can change over time so persistence is a problem. The other obvious problem is that school names aren't unique. I have no idea how many St Mary's schools there might be in the UK but I'd guess that /schools/st-marys would return a fair few results. Again in my head, an identifier is a label that's guaranteed unique in some defined context. @Dmitry picks up on this in the comments suggesting that a desirable URI for a school should include type/class identifier 'school', school name, city/..., state/province/..., and country which is not dissimilar to a database composite key combining the identifier for the thing with some facet identifiers to add just enough context to guarantee uniqueness. It looks like a nice, desirable solution but it doesn't solve the original problem of school names changing and it introduces a whole new set of problems of its own. Firstly it implies that the world is a mono-hierarchical taxonomy of things when the world is more like a giant set of many-to-many relationships. The world is not a filing system or indeed a set of Russian dolls. Secondly it compounds the changing school name problem by introducing a whole set of other labels that are also subject to change. And thirdly it assumes where mono-hierarchical taxonomies do exist they remain stable over time.

The classic example of this is the Linnaean taxonomy and the use of genus and species labels as a composite key to identify a species. In practice it's fraught with difficulties as biologists constantly re-classify species into genus. As my old master would say, never build your taxonomies into your URIs because they will become unmaintainable and make you cry.

For now I'll skip over the second option and come back later. Option three is to use the database record number for the school. So basically publish the primary key of the database row as the web identifier. Which is a fairly common solution to the problem and common enough to be the default pattern for a Ruby on Rails app where the out of the box URI for a 'thing' page is /:table_name/:primary_key. And I'd guess this is what the 114 is in the URI of Jeni's blog post. It's also how dbpedia lite mints its URIs using the primary key from the Wikipedia table row. Back when I was a lad there used to be a standard warning accompanying any plans to use database primary keys in URIs: what happens if your database drops for some reason and you have to resurrect it and it gets resurrected in some different order with primary keys assigned to different things. Although I've never seen that happen in practice...?

So if publishing with composite keys has problems and publishing primary keys is frowned upon the only other option is the surrogate primary key: a column in your database table that's guaranteed unique across that table but isn't the primary key. Which is pretty much what a MusicBrainz 36 character UUID is in http://musicbrainz.org/artist/d5da1841-9bc8-4813-9f89-11098090148e. And also what the 8 character PID is in http://www.bbc.co.uk/programmes/b006mw1h. (Although at least some PIDs are actually 2-way transforms between Freeview broadcast CRIDs (non-HTTP URIs) but that's a different story.)

Back to Jeni's option 2 then, which again is a surrogate primary key. She makes the distinction between primary key and surrogate key by saying:

Using the record number for the school within the particular database that's being published is entirely non-human-readable because there is simply no way of finding out what that would be for a given school. The unique reference number for the school, on the other hand, may be an obscure series of digits, but it is a meaningful one which renders the URI readable and hackable.

The obvious point is that the school reference number might be readable (though isn't natural language) and it might be meaningful. But it's only really more meaningful than a primary key to a very select group of school administrators.

The other point is that you can only reuse "real world" identifiers as your surrogate key if "real world" identifiers exist in the domain you're working in. Using real world identifiers is really more a case of outsourcing your obfuscation because someone else has done the work already and, as ever, it's best to reuse and recycle. Meaningful to some is better than meaningful to none.

But "real world" identifiers tend to exist where there's administrative benefits around transactions (car registration plates, ISBNs, catalogue numbers, DOIs, National Insurance numbers, TV Licence numbers...). And they tend to only act as identifiers within that administrative framework. As Frankie picks up in the comments ISBNs are useful, can be meaningful / recognisable to some people and do have structure. But without wishing to disappear down a FRBR shaped rabbit hole they're about editions / saleable items. And there are no similar "real world" identifier frameworks for works. Or the usual problem that this is not a dramatisation of this, this or this. It's a dramatisation of something that no one's ever bothered to give a real world identifier to because there's no administrative / transactional benefit in them doing so.

So using real world IDs as surrogate keys is useful and adds meaning for some users but it's only possible where real world identifiers already exist. Otherwise you end up having to mint your own.

Which does open up the option of making your surrogate keys human readable / natural language URI slugs (as posterous does to this post). Given enough people to throw at the problem any site can generate human readable / natural language. It really depends on the throughput of data, the number of new pages that results in, the friction it introduces and the cost.

And I think cost is the big factor in all of this. For large data volumes you need human intervention to allocate URL keys. And human intervention is expensive. And human's make mistakes and change their minds. So you need to start storing history to generate redirects. Which adds storage and code complexity and makes things more expensive.

I'm still unconvinced that anyone outside "the industry" cares about any of this. No-one stops using twitter because a tweet URI is obfuscated. Amazon still makes a profit despite its use of obfuscated product IDs. I've sat through a fair few user-testing sessions and I've never seen anyone hack the URL or even look at it unless the task being tested is sharing in which case they copy and paste and send.

Which is not to say I wouldn't by preference make URIs human readable, human meaningful, natural language and hackable. Mostly because it seems polite. If I was building a website for my local restaurant then I'd definitely go for /drinks/wines/pink/sparkling or whatever. But for a site of any real complexity, based on any real amount of data / content, human readable / meaningful / natural language / hackable costs money (in admin, in storage, in code complexity for redirects). I'm going to kick myself for using 'return on investment' but I struggle to see where it lives in this case.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 21 Apr 2011 12:07:00 -0700 A tedious post about designing IDs http://smethur.st/a-tedious-post-about-designing-ids http://smethur.st/a-tedious-post-about-designing-ids

Following on from a tweet from Erik Wilde the other day...

there should be a way how web pages can advertise the fact that they intend to maintain @id as stable identifiers of page fragments.

...I thought I'd type up a few notes on how we went about designing IDs on /programmes, /music and /nature. I guess the truth is we didn't really design them; their form and structure came about as a side effect of the way we worked in general. So taking /programmes as an example...

...the first step (at least once we got to the web page stage) was to make each and every primary resource addressable at a persistent URI. In the first instance all these pages had was an h1 with the title of the object (or something to serve as a title if no obvious title presented itself). As ever we tried to keep the URI structure as flat as possible and avoid building in taxonomy to maximise persistence. Over time we filled these pages out with additional information but only ever showing direct attribute data.

Next we linked up directly connected primary resource objects (so episodes to the series they belonged to etc).

Step 3 was to build aggregation / list views (schedules, a-z, genres and formats) to get to the primary resources.

Step 4 was to build the subsidiary resources and scoped aggregations. As an example a programme episode can have many contributors so we made /the_programme_episode/contributors to list them.

And finally we built out the primary resource pages by including (or transcluding as some might say) subsidiary resources onto them. So /contributors was mirrored as a fragment of the episode page. And brought its URI fragment identifier with it for use as its ID attribute.

So you end up with http://www.bbc.co.uk/programmes/b01064h6/segments and http://www.bbc.co.uk/programmes/b01064j1#segments.

And in the wildlife world http://www.bbc.co.uk/nature/life/Tiger/sounds and http://www.bbc.co.uk/nature/life/Tiger#sounds.

What's the point?

This all seems like quite a lot of work for very little benefit but there is some point:

  1. Fragment identifiers matter because people (and not necessarily you) link to them and if they change links break. And with Google now indexing fragments, changing fragment identifiers risks losing Google juice.
  2. It forces you to think about your fragment identifiers as much as your standard page URIs because they're one and the same thing. So they need to be designed with all the usual requirements of non-fragment URIs: readability, hackability and most importantly persistence. And they need to follow your standard URI design patterns which in our case means lower case, hyphen separated and true to the language of the domain model.
  3. It means adding IDs for fragments / anchor points isn't just a case of typing a string into a template. Before you even get to the template you've already thought about the patterns, written a route to handle the subsidiary resource, written controller code etc.
  4. It means you can rapidly respond to user testing and real use in the wild by reprioritising the elements of your user experience. If your page gets cluttered you can easily remove the content of a transluded fragment and instead link to the subsidiary resource.
  5. You can easily change the experience for different platforms. If users are browsing with low-end mobile phones, page weight and download speeds matter. So you can remove the content of less important transcluded resources and again just link to them. (Though you're probably better off just using @media queries and responsive design for smart phones.)
  6. Everyone working on the project (software engineers, user experience people, product managers...) can easily see which subsidiary resources are available to build the primary resource page. Like an ingredient list for a recipe.
  7. You can easily add data views to subsidiary resources (RSS for an episodes available to watch list) that live at sensible URIs.
  8. It's not actually that much extra work. You can reuse the same model code and templating and CSS. It just needs a new route and some minimal controller code.
  9. Anything that makes you think before you code saves work.

Back to the question

there should be a way how web pages can advertise the fact that they intend to maintain @id as stable identifiers of page fragments.

I'm not aware of any mechanism to do this. That said I'm not aware of any mechanism to allow web pages to advertise the fact their URIs in general won't change. Maybe there's something out there that I've not come across but if not it would be a nice idea if you could say "we guarantee our URIs (including fragment identifiers) for 5 / 10 / 20 / 50 / 100 years". URIs are your contract with the web and formalising your side of the bargain seems to make sense. It would certainly make my life easier...

Being unable to answer the question raised another question in my head: why do fragment identifiers change?

I think mostly it's because they're just not given the same consideration as non-fragment identifiers. Lots of organisations have URI design policies but they rarely extend into the document.

But maybe it's also because ID attributes fulfill three different functions:

  1. They act as fragment identifiers as here.
  2. They act as hooks to hang CSS styling off. Personally I think using IDs in CSS introduces too much specificity and avoid them unless I'm targeting a genuine fragment for styling.
  3. They act as hooks for javascript behaviours. Unfortunately this one's more difficult to avoid. Targeting an ID in javascript is simpler and quicker than targeting a class and picking the first one where there is only one.

I'm not suggesting we go back to <a name=""> but I do wonder if some means to separate genuine fragment identifiers from styling and behaviour hooks would be useful?!?

Finally I wondered what happens if the fragment identifier remains unchanged but the URI of the containing resource does change. So if example.com/foo has a fragment of #me and example.com/foo gets 301ed to example.com/bar. I ran a quick test and found out that the behaviour changes depending on the browser. So Firefox 3, Chrome 10 and Safari 4 all take example.com/foo#me and send example.com/foo off to the server. And the server responds to say example.com/foo has moved to example.com/bar. At this point Firefox and Chrome both take example.com/bar and append the original fragment identifier to get example.com/bar#me. But Safari seems to suffer a short-term memory failure and the #me never gets appended. From chats with Yves and Nicholas Firefox and Chrome seem to be behaving correctly and Safari's getting it wrong.

(At this point the conversation veered off into whether servers should ever get involved with fragment identifiers and whether content location should ever return a fragment URI. And what would happen if a slash based URI linked data site (like DBpedia) decided to change to a hash based URI scheme (or just became RDFa in Wikipedia) and whether it would be possible to redirect from a slash to a hash. But that seemed outside the limits of tedium / my understanding for even this tedious post...)

A tedious summary

If you consider your website to be a tree (which for all kinds of other reasons you really shouldn't - the web is a web, not a tree) and your pages to be leaves, then fragment identifiers are those bits of the twigs that extend into the leaves as veins. (At this point I asked Tom for the scientific name for leaf veins but his memory failed and I felt quite let down.) They get forgotten about because they're lesser seen. And because they get forgotten about they're often not designed with the care and attention we put into non-fragment URIs. Separating out fragment identifiers from styling / behaviour hooks and making transcluded fragments separately addressable doesn't guarantee persistence but putting in the thought up front does make change less likely.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Tue, 19 Apr 2011 03:02:00 -0700 Lean back != passive http://smethur.st/lean-back-passive http://smethur.st/lean-back-passive

User experience design is not a discipline that lacks buzzwords. When it comes to designing user experiences around television programmes the buzzwords breed and multiply. It's not a phrase I've come across recently but three or four years ago the fashion was for "1 foot (phone), 2 foot (laptop) and 10 foot (tellybox) experiences". Recently that seems to have been replaced by folks describing traditional tellybox viewing as a "lean back experience".

The implication is clear: if you're designing for experiences outside of the traditional desktop browser you can throw away the web textbook. People don't want to browse lists to find content and they definitely don't want to think. In place of lists and links if you want to make an experience that feels like television you need little more than an on button and a "recommendation engine" that only surfaces content relevant to you. But...

...when I turn on my TV set a typical "user journey" looks like:

  1. Hit the menu button. This brings up a horizontal list of the major functional categories (EPG, downloads for purchase, recording etc).
  2. Scroll to and select the Recordings button. Which brings up a list of what (in my working life) I might call TLECs. Or top level programme groupings (i.e. Gardener's World, not a series or episode of Gardener's World).
  3. Scroll to and select the Midsomer Murders button. Which brings up a list of recorded episodes of Midsomer Murders.
  4. Scroll to and select the episode I want. Which brings up a list of options: Resume play (if I've already started playing it), Play from start, Delete, More information etc.
  5. Scroll to and select Resume play.
  6. Pour a drink / lean back

I don't use search because it's hard to type from a remote control (made more difficult by the fact that my cat likes to sleep on top of the STB with his paw over the remote detector bit). And on the few occasions I visit the EPG it's usually to scroll forward (sadly no scroll back on BT Vision) to set up a programme to record. Even then it's a case of scrolling and clicking: up and down through channels and forward and back through time.

What I never do is turn on the TV and just watch (unless it's already tuned to the murder / mystery joys of ITV3). Content does not "come to me", I browse to content. And this isn't a new digital TV pattern. When I visit my mum the EPG is never used but the up / down channel buttons get a thorough clicking in search of the end of the soap opera rainbow.

None of which is to say I don't think there's a future for a "personalised TV experience". My ideal TV would look much like the one envisaged by Nicholas Negroponte back in 1995. But that takes a lot of media description bits and a lot of user tracking bits and a lot of social bits plus some very clever people to write the code to mine all that.

So I don't think we should shy away from list based navigation. (For all its one button shinyness the iPod / iTunes navigation is just some deeply nested (or facetted depending on screen real estate) lists.) And saying that the browser based TV experience should be more like the TV experience (and by implication more push and less pull) feels like wonky logic as the digital TV experience becomes more and more like browsing the web. And browse, click, repeat, lean back, repeat is definitely not passive.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Fri, 21 Jan 2011 06:49:00 -0800 Programme shelf life, event TV and social media http://smethur.st/programme-shelf-life-event-tv-and-social-medi http://smethur.st/programme-shelf-life-event-tv-and-social-medi

Lots of people in my Twitter world spent yesterday pointing at a Read Write Web post on What Glee Means for Twitter & Television. It's all familiarish stuff around second screen social media:

Twitter is becoming a side dish for prime time entertainment and, as the networks catch on, it's becoming a tool for bringing the audience back from the land of DVRs and time-shifted television into real-time viewing.

The hook for the article is the use of social media by the programme producers so:

The characters on Glee actually tweet and they tweet during the show. When Glee starts, the moment it airs for the first time on the East Coast, the tweets per second for Glee shoot up. They stay up there at a super high level at hundreds of [times] what they are before the show comes on until the moment the show ends and then they drop. [...] People feel like they have to watch the show while it's going on because the community is tweeting about the show and the characters are tweeting as the show's happening so [they have to] watch it in real time.

Which is interesting for TV production (although I guess radio presenters have been presenting and tweeting for quite a while). But it wasn't really the technique that caught my eye. Lots of people in TV spend lots of time talking about "event television" and "watercooler moments" (sorry). The PVR and catch-up services like iPlayer shifted the emphasis away from the broadcast and up to the episode as the thing that people talked about / pointed at etc. And now social media seems to be shifting the balance back toward the broadcast. Which makes it slightly less about the thing you've watched and slightly more about when you've watched it. Twitter etc allow for distributed communal viewing and make the "watercooler moment" (again sorry) and the broadcast moment one and the same.

From a broadcaster's point of view this is obviously good news because up until the closing credits of the first broadcast the broadcaster retains complete control. And once someone records to their PVR they lose at least partial control. Once a programme's recorded they can fast forward through advertising breaks which, for a mainly advertising funded medium, is a bit of a pain. And part of the reason I guess why product placement will be permitted on UK TV from next month.

The PVR argument is the usual one given for why the broadcast moment is still important to broadcasters. But there's more to post-broadcast loss of control than PVRs. Past the first broadcast there are obviously repeats and DVDs and box sets. But there's also bit torrent and a generation growing up who've grown used to asking for content and getting it. And once content is out in the wild and easily digitised and easily duplicated and easily distributed by a web which is world wide...

All of this is a particular problem for the broadcast industry which is still very much based around territorial releases. Glee for example debuted in the US in May 2009, but didn't arrive in Australia until September 2009. Social media might be useful as a means to switch people away from on demand and back to broadcast. But for all the problems it might solve it only exasperates others. Because the people you friend / follow / whatever on social media services aren't restricted to those who happen to live within reach of the same transmitter tower as you. Which means if you're in Australia you might be getting Glee spoilers four months before you're (legally) able to watch it. Or you could wait a couple of hours and just take the less legal route. Seeing social media as a handy way to expose your "product" is fine if you can control who it's exposed to and make sure there's some correlation between the exposees and the availability. But social media is web scale and global; the broadcast business model isn't.

Back in the day the music industry had similar problems with territorial releases timed to coincide with tours (and associated TV and radio interviews). But given time (and the intervention of Apple) they were able to adapt to a more connected world. That said it was easier for them; the record labels (or the major ones at least) often owned the publishing rights. The relationship between producers / distributors and rights holders in the broadcast industry isn't quite so cosy.

Anyway, all of this made me wonder what changes about "event television" when events start to be shared internationally rather than nationally / regionally. And whether social media is starting to change the definition of "event television".

In the past (and in the main) event television has tended to be more about programmes with a short shelf life (or rather programmes with content with a short shelf life). Match of the Day feels like an obvious example. If you watch on a Saturday night (and you haven't peeked at the sports results on the news and you don't have Sky) the content is relevant. If you catch the repeat on a Sunday (by which point you've probably read a paper or listened to the radio or checked a website) it's slightly less relevant. By the following Saturday it's relevance is pretty much zero. Obviously it still has historical value and if it's the episode where your team beat the local rival 7-0 it probably has value to you and fellow fans. But it's not going to be repeated at Christmas or rebroadcast on Dave or released on DVD.

So different programmes have different shelf lives. The Archers has audio books, 606 doesn't; Top Gear has DVDs, Crimewatch doesn't. Some content retains it's value well past first broadcast but that's the same content that retains its value when delivered via less legal routes. And it feels like that's the content that broadcasters should be working hard to get to audiences at first broadcast. But most of the social media effort seems to go toward more obvious targets; the kind of programmes that have always been focussed on audience feedback whether via the postbag or the phone or email. Which tend to be the kind of programmes whose shelf life is shortest. It might make for better programmes but it isn't about tempting audiences back toward broadcast before the content "escapes".

Glee is different. It's classic long shelf life content. It's been syndicated to numerous countries. It's available on DVD. I dare say it's available all over bit torrent. So the fact that the producers are using social media to tempt people back toward first broadcast is interesting. Because it isn't about a big media company using social media as another source of "user generated content" or a simple promotional channel; they're promoting the event, not the product.

I get the feeling that if a British broadcaster attempted to do what the producers of Glee are doing they'd pick on a Coronation Street or an Eastenders. But soaps, like phone-ins, are short shelf life. Given that broadcasters are spending money on social media and given that the pot of money to do that is limited I'm thinking the best place to spend that money is on long shelf life content rather than phone ins?

PS. If TV and social media is your thing you might want to read APIs and URLs for Social TV.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 06 Jan 2011 15:22:00 -0800 UX deliverables are dead, long live code http://smethur.st/ux-deliverables-are-dead-long-live-code http://smethur.st/ux-deliverables-are-dead-long-live-code

This post is a possibly ranty reaction to the recent Wireframes are dead, long live rapid prototyping post on UX for the masses. If you've not read it, it's definitely worth a click. And also worth working your way through the comments it generated. As the title makes clear the post argues that wireframes, a traditional tool of the 'UX professional', no longer add value. I'd agree with this and probably go further and say they never have. And probably go further still and say none of the accepted 'UX Deliverables' (sitemaps, user journeys, card sorts (you're not building a library; this is the web; things don't have places) etc) add value either. And don't even get me started on 'Personas'...

But I disagree with the post on two counts. First off the 'Why ditch wireframes?' section lists six reasons why wireframes don't help you actually make stuff. I'd agree with all six but think they're not the most important reasons why wireframes suck. And secondly:

the alternative is to bypass wireframes altogether and either go straight from sketch / outline designs to developing working code (in an Agile fashion), or as is more use common use [sic] a rapid prototyping tool to create a prototype

which loses me at the comma. Because rapid protyping tools suck too. And, being harder to use than Omingraffle / fag packets + pencil, suck worse than wireframes.

If wireframes aren't dead I'd be more than happy to take them outside and put a bullet through their head

It's probably best to consider this list a personal addendum to the six reasons to ditch wireframes list in the original post.

  1. 'UX professionals' seem to be obsessed with defining optimum user experience. Which is fine if you're making a poster or a magazine or a twonkPad app. But mostly we're talking about the web. And if I want / need to overwrite your CSS (because you've not designed for colour blind users (again!)) I will do. And if I want / need to increase your font size (because I'm getting older and my eyesight isn't as good as it was) I will do. And if I care about my privacy and want to ban your cookies I will do. And if I don't trust your javascript and want to stop it running I will do. And if my eyes really give out and I need to use a screen reader...

    On a more basic level if I want to resize my browser I will do. And if your wireframes say the search form is top right what happens? Is your layout absolute so the search form disappears somewhere out of view or is your layout liquid making the search form always visible? CSS makes this behaviour explicit, wireframes don't.

    The point is user experience on the web is up to the user. It's not something you as a designer / publisher can define and control. Wireframes are just one way to fool yourself and / or your 'client'.

  2. Wireframes conflate: the data / content / assets you need to build a resource with one specific representation of that resource with the visual layout of that representation with the document order of that representation. Breaking that apart:

    1. Imagine you want to list some things. Imagine that list is time based with new items being added as some state somewhere changes. Do you want to make RSS of that list available? Or imagine it's a list of events. Do you want to make that list available as an ical / ics feed? From a less data centric point of view do you want to make a mobile friendly version of your page? A smart phone version? An iPad version? Then what is your wireframe supposed to represent? And are you going to make a different wireframe for every possible device that might at some point want to access your content?

      The point is to document the data / services you need to build all representations of your resource, not just some simplified / idealised desktop HTML version. How best to do that? My advice would be over a coffee / beer with a developer on a fag packet. If god had intended us to specify data queries in plain Engish / French / whatever (s)he wouldn't have invented SQL or SPARQL. The best way to document code is in code. The rest is paperwork and process.

    2. The second problem is more general and more common. Very roughly speaking (and for accessibility and SEO reasons) an HTML document should be: h1, content, navigation. For some 'stickiness' reason or some misguided belief that most visitors arrive through your homepage and have the patience to navigate your navigation, page layouts tend to stick 'global navigation' (for some definition of global) at the top of the page. Which is fine from a page presentation point of view. But a bit rubbish from a document design / SEO / accessibility point of view. Again, title, content, navigation. And use CSS to 'move' your navigation to the top of the 'page'. If god had intended us to specify page layout in pictures (s)he wouldn't have invented CSS.

      Document design has nothing to do with visual layout (except by default with CSS off). Design your document first, then use CSS to add layout. For what it's worth I divide my CSS into two bits: layout.css handles the floats and the positions and the margins and the padding; decor.css handles the decorative stuff (colours, typefaces etc). Like it or loathe it, CSS is the design language of the web. If you want to wireframe stuff edit layout.css. Any other means of codifying layout is a lie and will catch up with you.

If wireframes are dead, rapid prototyping tools should die in their arms

So I don't really like wireframes. But I really dislike rapid prototyping tools. If you've not already read the comments on the original post it's probably worth doing so. They break down roughly into:

  1. wireframes are still useful as a deliverable / conversation tool with clients
  2. suggestions for other rapid prototyping tools to use. From a quick scan through, suggestions are: Axure, Balsamiq, EasyPrototype, iPoltz, InDesign, SketchFlow, Flowella, Justinmind Prototyper, LucidChart, Tiggr, Protoshare, Naview and Cacoo

How in god's name did so many people make so many tools that pretend to make websites when making websites isn't that hard? The problem is all these tools output wireframes with links. Sometimes they do that by outputting HTML and CSS and a few annotations. But even then they do it badly. Which is almost forgiveable: even people employed to make HTML and CSS often do it badly; the machines never stood a chance. But they have all the problems of wireframes (conflation of resource and representation, document and layout) without the benefit of being easy to ammend and update. Really, stick to pencils and fag packets and working code.

One final rant: Designing how code interacts with idealised data is easy. But you're never going to know how your application works until you combine real code with real data with real users. You can wrap your process in as many "user centric" buzzwords as you like but until you're testing real code with real data with real users the picture will be, at best, fuzzy. Or ship, test, change. And realise UX deliverables are fine as project checkpoints and fine as signifiers to client X that this might be a good point to donate to your bank account but they're really only of benefit to your agency / bank manager / random important person incapable of abstract thought, and they really won't lead to the things you're making being any better / the making of better things.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 23 Dec 2010 12:30:00 -0800 Privacy, groups and graphs http://smethur.st/privacy-groups-and-graphs http://smethur.st/privacy-groups-and-graphs

A couple of days ago a friend sent me a link to The Real Life Social Network. It's a presentation by Paul Adams, the Senior User Experience Researcher at Google, touching on offline social groups, online social groups, strong and weak ties, contextual personaes and privacy. The privacy bit in particular reminded me of danah boyd's rather wonderful Privacy and Publicity in the Context of Big Data paper, echoing the idea that privacy isn't a matter of how much or how little you share, but of how much you understand (and can control) the context of sharing. So the usual problem that a status update intended for three or four people gets seen by three or four hundred and has the potential to be seen by three or four million.

The presentation illustrates the problem using the real life example of Debbie. Offline she has four distinct groups of "friends", online they're all bundled into one big bucket. Which makes it problematic when she wants to share something with a subset of people in that bucket. It's a common problem but how does it get solved? Slide 84 of the presentation says:

Allow people to create custom names for groups, and allow people to rename the group if it changes over time.

Allowing people to create groups and share only with people inside a group is the obvious way to solve the contextual sharing problem. And the Flickr friend / family split is an obvious example. But online groups have problems. Someone has to decide who's in the group and who's excluded. And if the name of the group can be changed someone has to have control over editing it. It all comes down to who controls the group. I could create an online group of school friends: Alice, Bob and Chris. But what if Alice hates Bob or Bob hates Chris or Chris hates me but is too polite to say?

Which I guess is my problem with Twitter lists. Anyone on Twitter can create a list and add anyone else to it. And give the list a descriptive title to make a claim that this group of people can be described by this label. But the people involved can't make a counter claim and have no control over whether they're on the list / in the group or not. Lots of people add the usual disclaimer to blogs and Twitter profiles that these are my views and don't reflect the views of my employer only to be added to some list bearing the employer's name by someone they may or may not know.

My other problem with Twitter lists is they make the context of sharing more opaque. Someone might follow a list you're on without directly following you and since you never get a following email you tend to forget they're there. And the same goes for retweets. But that's a side issue.

So if user defined groups don't work (and I don't think they do) what other mechanisms are there to share with a subset of people from a big bucket of 'friends'? Rather than categorise people into groups you could try categorising your relationships with people. Which is the XFN approach. But again I find XFN a bit creepy. It's easy to say you've met someone or they're a colleague or sibling or a spouse but then you hit the 'friend' word again and have to decide on friend, acquaintance or contact. Which is rarely an easy decision. It's a bit like making a new friend on Facebook and being faced by a ten drop-down choice of how you know them and where you met. It all just adds friction. And whilst I have no data about this, I'd bet that a lot of people lie. Because who wants Facebook to know you met someone in Greece, shared a house for a while, were engaged for 6 months but parted amicably when all you want to do is post on someone's wall.

I think there's a wider problem here. Both grouping friends and categorising relationships are ways to express your world view onto sets of other people. There's a lot of talk about social graphs but every time you see a social network diagram it has one person in the middle (you, the author?) with links out to some other people and possibly some links out from them to a few more. But it doesn't look like a graph. Or if it does look like a graph it looks like a very egocentric one. So to solve the problem of sharing in context maybe we need to step outside / above the graph and think less about how we connect to other people and more about how other people connect to other people. Because a group isn't defined by how I identify or label it but by the density of the interconnections within it.

Chatting with Tom about this today he pointed out that lots of social network sites use the wider graph of connections to recommend new people to connect to. So if I don't know Dave but I do know Alice and Bob and both Alice and Bob know Dave the service will recommend I connect to / follow / friend / whatever Dave. LinkedIn does this to recommend new contacts and Facebook does this to recommend new friends. And I think Twitter did do this to recommend new followees. But since new Twitter came along I can't find it anymore. (That said I can't find anything in new Twitter and it just keeps telling me I'm a bit like Tom which is reassuring but not informative.)

So this made me wonder if anyone was using the wider graph (beyond who you're linked to) to give a better sense of the context of sharing. When it struck me (somewhat late admittedly) that this is exactly what Twitter did when they changed how replies work. I have no idea why they chose to change this and whether it was about making context a little more explicit or just about managing server load but...

..in the old days when you posted a status update to Twitter anyone who followed you (and you hadn't blocked) could see it no matter what the content. Sometime ?this year? Twitter changed how this worked. If the tweet started with @alice only people that followed you and Alice would see it in their timeline. At the time lots of people (including me) complained but in retrospect it feels like a good way to make sharing contextual to a group. At least if you define a group by it's interconnectedness and not by your own definition. It changed the display logic from a line (x follows y) into a triangle (x follows y and z, and y spoke to z). Which isn't exactly a rich graph but is at least not a line.

So maybe other services already do this and make context as a product of interconnectedness more explicit but I can't think of any. And I wonder if it could be expanded further. At the moment it's only possible to separate out one person (the repliee) from any other people mentioned. If it were possible via better annotations to separate people who were the subject of a tweet from the people who the tweet was (primarily) aimed at you could restrict the context to only people mentioned and people who follow you and all the intended recipients. Would that work? Or just results in lots of #reallyfixreplies?

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Wed, 22 Dec 2010 13:24:00 -0800 Locomotive numbers, train spotting and QR codes http://smethur.st/locomotive-numbers-train-spotting-and-qr-code http://smethur.st/locomotive-numbers-train-spotting-and-qr-code

This was originally written as a bit of a help page for @locospotr, a Twitter based train spotting site that was supposed to be a sister service to BeerSpotr and the sadly neglected CCTVSpotr. Unfortunately LocoSpotr never went live because I never quite finished the code and couldn't find a friendly designer to make the CSS.

So I forgot about this until Paul Carvill tweeted today:

from the Wall of my Facebook group for people on trains: "how about a QR code scanner that you can use to find out which train you're on?!"

Which made me think how cool QR Code trainspotting would be. Our duly elected (ish) leaders would no longer have to worry about impressionable british teenagers hunkering down over youpron. Instead they'd all be out sporting smart tanks tops and twonkPhones on the platforms of Clapham Junction eagerly snapping the QR Codes on the sides of trains.

All it would take is the various train operating companies giving all their locos (and units and classes) cool uris, some stylish Duncan Robertson QR Code stuff and a twonkPhone app. And we could save our nation's youth from death by decadence and wanking. Or maybe not. In the meantime:

Locomotive numbering

In order to identify and organise locomotives, railway companies usually give each one a number. These numbers are usually unique within the confines of the railway system and period. But they are not globally unique and not unique across time. Two locomotives on two different railway systems might share the same number. And a single locomotive might have many numbers over time. The Flying Scotsman, for example, has carried four numbers over its lifetime - 1472, renumbered 4472, renumbered 103, renumbered 60103.

UK locomotive numbering post 1973 - TOPS

In 1973 British Rail adopted the Total Operations Processing System (TOPS). This system gave each locomotive a unique number comprised of 5 (sometimes 6) digits. TOPS survived the breakup of British Rail and is still in use today. If you're in the UK and interested in locomotive spotting this is the number you want.

The trouble with 'multiple units'

Most modern passenger trains are comprised of either DMUs or EMUs. A multiple unit is basically a passenger train without a separate locomotive - every vehicle provides at least some passenger accommodation. They range from 2 coach local units which are little more than buses on rails to fast intercity units like the Virgin Pendolino.

Under TOPS, locomotives, carriages and units all have identifying numbers. Carriage numbering is much more confusing so I won't go into details here. In a locomotive hauled train there's little chance of confusion because the locomotive is obviously separate to the carriages. But in the case of multiple units the unit and all its individual carriages will have identifying numbers. Usually the carriage numbers are shown along the side of each carriage and the unit number is shown on either end. But the position of the unit number can vary between train operators.

For the purposes of LocoSpotr the interesting number is the unit number which is comprised of 6 digits (eg 444040). If you're interested in spotting carriages this probably isn't the site for you but if you spot a carriage that's particularly fascinating you can always add its number to your spot as a Twitter hashtag.

Finally multiple multiple units are often coupled together to form a single train. If you walk down a train and a see a driver cab area, that's the start of a new unit. Each unit has a number. The train as a whole doesn't - or not one you can easily find out. You might want to tweet a separate spot for each unit in the train - you might not.

A note on British Rail's HST - aka Intercity 125

Back in the 1970s British Rail introduced the Intercity 125 HST. Plenty are still running today. They're slightly unusual in that they're passenger units (so have carriage and unit numbers) but are hauled by 2 class 43 locomotives (one at each end). For the purposes of LocoSpotr the interesting numbers are the locomotive numbers (43xxx).

Locomotives with names

Some locomotives have names as well as numbers. Back in steam days the names were usually more persistent than the numbers (eg the Flying Scotsman changed its number 3 times but never changed its name). These days numbers tend to be more persistent than names (eg Pendolino 390010 was originally named Commonwealth Games 2002, renamed Chris Green, then A Decade of Progress).

It's probably better to identify the locomotive by its number than it's name. You can always add the name as a separate hashtag:

@locospotr loco:390010 class:390 #adecadeofprogress

Spotting preserved locomotives

These days there are lots of preserved locomotives running on heritage railways. The majority predate the TOPS system and many predate British Rail. In the days before BR, locomotive numbering was much more fiddly. Each railway company had its own numbering system and many of these systems overlapped. When UK railways were nationalised the locomotives BR inherited were renumbered (this happened a few times before TOPS was introduced).

Many preserved locomotives were either withdrawn before TOPS happened or have been restored to their pre-TOPS livery / numbering for the purposes of nostalgia. There's a lot of debate amongst blokes called Trevor about whether a locomotive that's been technically altered since it first carried a certain livery / number should carry that livery / number in preservation. Which we'll skip over for this intro.

The obvious question is, having spotted a preserved locomotive, which number should you record it under? For the purposes of LocoSpotr it's probably best to record it under the number it carried when you spotted it.

Non-UK locomotives

There's not much to say here cos:

  • I don't really know all that much about forun trains
  • I'm not sure that train spotting is an obsession that extends much outside of 1950s Britain

Sorry!

Locomotive classes

Having gone to the trouble of designing a locomotive it's very rare for only one example to be built. Usually many locomotives are built to the same design. The set of locomotives built to a single design is called a class. Often there'll be minor variations in design and build between locomotives in a class; these variations are usually referred to as a sub-class. But in general all locomotives belonging to a class will be recognisable as part of the same family. Think of locomotive classes like car models: there might be a diesel Ford Focus, a petrol Ford Focus, a 3-door Ford Focus, a 5-door Ford Focus but they're all recognisable as the same basic model.

Again railway companies usually give each class a number which is usually unique within the confines of the railway system and period. But again they are not globally unique and not unique across time. Two different classes on two different railway systems might share the same number. And occasionally class numbers are reused over time; there have been 2 completely different class 70s under the BR TOPS system for example. Tut.

UK class numbering post 1973 - TOPS

When British Rail adopted TOPS each locomotive class was assigned a unique number comprised of 2 or 3 digits. This system is still in use today. Diesel locomotives fall into classes 01-69, DC electric locomotives 70-79, AC electric locomotives 80-96, departmental locos (those not in revenue-earning use) 97, and steam locomotives 98. Diesel multiple units (DMUs) with mechanical or hydraulic transmission are classified 100-199, with electric transmission 200-299. Electric multiple units (EMUs) are given the subsequent classes; 300-399 are overhead AC units, while Southern Region DC third rail EMUs are 400-499, other DC EMUs 500-599.

Luckily for train spotters the TOPS class number is incorporated into the TOPS locomotive number as the leading 2 or 3 characters. To get the class number just take the locomotive number and remove the final 3 characters. So a locomotive with the number 66713 is a member of class 66, a locomotive with the number 390010 is a member of class 390 and a multiple unit with the number 444040 is a member of class 444.

To make things simpler still there's a useful pictorial guide to some of the common BR TOPS classes at Wikimedia Commons.

Classes with names and preserved locomotives

Some locomotive classes have names as well as numbers. Sometimes these are officially sanctioned, sometimes less so. Back in steam days names were often used in preference to numbers (e.g. King class, Castle class, Merchant Navy class).

For the purposes of LocoSpotr if the locomotive class has ever had a TOPS number (even if the locomotive hasn't) it's probably better to use that. You can always add the class name as a separate hashtag:

@locospotr loco:d1005 class:52 #western

If the locomotive class pre-dates TOPS and is known better by its name then use that:

@locospotr loco:35005 class:merchantnavy

Many locomotive classes that pre-date TOPS never earned anything but a nickname. Usually these classes were given alphanumeric labels that sometimes reflected their power classification, sometimes reflected their primary use (e.g. a class 9f was a powerful (9) freight (f) locomotive) and sometimes reflected nothing at all. In these cases just use that label:

@locospotr loco:4472 class:a3 #flyingscotsman

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Wed, 01 Dec 2010 13:00:00 -0800 For @edsu http://smethur.st/for-edsu http://smethur.st/for-edsu

DBpedia is often described as a Linked Data hub. Its about page even describes it as a Nucleus for the Web of Data. A glance at the LOD cloud diagram shows DBpedia front and centre with lots of datasets pointing in.

The implication is if you have some book data and I have some book data, we can both link to DBpedia as a central identifier hub and it becomes possible to triangulate between our datasets. Which is fine. At least if the identifiers don't wobble.

But I wonder what happens if you have some book data and I have some music data or you have some health data and I have some nutrition data etc. DBpedia has data from lots of different domains. By showing it as a single bubble on the LOD cloud it's implied that it's internally coherent, heavily interlinked and homogenous. I wonder if that's true.

Lots of people have looked at the link density within Wikipedia but DBpedia relationships are only the subset it's possible to auto extract (mainly infoboxes and categories). And most of the links within Wikipedia are inline in the article text. So how interlinked is that subset of Wikipedia links. Would it be possible to follow your nose across domains from punk to the Sex Pistols to Jamie Reid to the Situationists like you can in Wikipedia for example? Or not?

Or if you made a diagram similar to the LOD one but only depicting link density inside DBpedia would it look like an homogenous data space or more like a microcosm of the LOD cloud with lots of sparsely connected islands. Is it a hub or a set of hubs that happens to be extracted from the same source and hosted in the same place? If it's the latter it kinda implies that we don't have one Linked Data cloud but several?!?

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Thu, 25 Nov 2010 14:37:00 -0800 Linked Data, website as API and URI fragility http://smethur.st/linked-data-website-as-api-and-uri-fragility http://smethur.st/linked-data-website-as-api-and-uri-fragility

The document web

Back in 1998 Tim Berners-Lee published a W3C style guide with the title Cool URIs don't change. Probably everyone reading this has read that but if not there's no better time than now.

Cool URIs was a plea for web developers to design HTTP URIs for persistence and addressed many of the potential pitfalls regularly encountered en route (site redesigns, URIs mapped to file and folder structures, URIs exposing technology stacks that change over time etc).

By this stage in web history most developers and designers (or most of the ones I know) have experienced what goes wrong when URIs change. Users follow links and get 404s or out of date information from unmaintained pages, people's bookmarks break, inbound links break, search engine equity built up over years from inbound links leaks all over the carpet. Uncool URIs don't make the web die, but they do make it whimper slightly. From a less self-interested perspective broken links also break accountability. If your site contains something people disagree with / object to, they can write their own page and link to / cite your original piece. If your URI changes it's not just a link lost to the web, it's a link lost in the chain of accountability and public discourse.

But, but, but in real life URIs do change. Usually because it's not just the developers and designers who have a say. URIs have become part of the furniture of the real world, like corporate graffiti tags. I'm typing this on a tube train and every poster at this end of the carriage features a URI in some shape. There are URIs read out on radio programmes, plastered across the sides of buses, on mugs, on t-shirts, on beers mats, on screen at the end of TV programmes. URIs are common currency and almost everywhere outside the rarified sphere of web developers and designers they're seen as labels, not identifiers. And labels change. If the world were run by a benevolent cabal of web devs I'm pretty sure that 95% of URIs would be cool. Unfortunately it's not and important people occasionally demand changes. Which usually results in web devs attempting to wrangle redirect files with the complexity of a less reader friendly Ulysses.

I don't want to go too deep into the ever popular URIs should be human readable arguments because it always ends up going in circles and no-one ever agrees. But in the context of persistence human readable URIs are a problem because labels change. And some things have different labels in different cultural contexts (before you even get to different languages). The current canonical example amongst my circle of friend is the ermine. Or the stoat. People in the UK call some species of animal a stoat. People in the US call the same thing an ermine. Or the other way round. This leads to edit wars in Wikipedia between Brits and Yanks. And because Wikipedia URIs reflect article titles, the URIs change. And change again. Which isn't cool. And because DBpedia URIs reflect Wikipedia URIs if you're using DBpedia URIs as identifiers in your own systems things break. Or at least whimper.

The data web

These days lots of websites (and not just the usual web2.0 suspects) make content available as data for consumption by machines. In 99% of cases this is done under an API separate from the main (document) website. And there are lots of reasons why you might want to separate the two. Machines (or at least those manned by impolite developers) tend to hit sites more aggressively than the average punter. Separating out the API allows organisations to make access dependent on possession of an API key and API keys can be used to track usage and impose rate limits. All of this goes back to one important point: the business value of having web pages is proved, the business value of publishing the same stuff as data isn't (yet).

In all of this there's a very important distinction: web pages are seen as the things punters surf; APIs as platforms for development. And no-one wants a brittle platform. Sales forces and marketeers and business owners often enjoy the bragging rights of having an API. But it's not something they ever see. The shape of the API is entirely in the hands of the developers. No one outside the development team is ever asking for redesigns or marketing URIs or human readability. It's just obvious to all that machines need persistent identifiers.

The classic case is Twitter. If you surf the document web the URI of a tweet looks like::

http://twitter.com/#!/fantasticlife/status/7925332711571456

with the tweet identifier nested under the tweeter's username. But on Twitter users are able to change their username. Not many people do it but it has been known. So in the old style REST API the same tweet lives at:

http://api.twitter.com/1/statuses/show/7925332711571456.xml

(where 1 is just the API version.) The username isn't there because the username can change and changing URIs make APIs brittle. And as ever encoding resource structure into URIs is the enemy of persistence. Or cool URIs stay flat.

So it seems there's a clear pattern. The document web benefits from cool-ish URIs (give or take) whereas the API view can never allow the cool mask to slip. But...

The website as API

In the Linked Data world there's a heavy presumption that you use HTTP appropriately (read RESTfully). That the HTML views (desktop, mobile, tablet...) and the data views all get served from the same URI depending on what the HTTP request asks for.

In the original design notes for Linked Data there's no explicit mention of content negotiation. The only instruction is When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL). It doesn't explicitly say any URI and it doesn't explicitly denounce the separate API / API key pattern but it's all fairly implicit from the Browsable graphs section. You can't browse graphs if you have to keep stopping to register another API key.

From my experience of the Linked Data world there's lots of attention paid to how Non-Information Resources relate to Information Resources (303s and hashes and possibly 200s when the bickering dies down). But there's relatively little attention paid to the content negotiation of the IR. And even when there is it tends to be about conneg between RDF and desktop HTML ignoring mobiles and tablets and etc.

Nevertheless, the general presumption seems to be one URI for desktop HTML, mobile HTML, RDF XML, RDF n3, json etc. I think it was Jeni Tennison who coined the phrase your website is your API. And for what's it's worth I think website as API / website as platform is the only sane approach. It makes data views and cross platform support just a matter of some templates and some CSS and some Javascript. Which means there's less code (almost always a good thing) and it's cheaper to develop. And it's all just using web standards (in this case HTTP) as intended.

But the point I've been trying to make is that currently there's a greater (business if not user) tolerance for uncool URIs and lack of persistence on the document web than there is on the data web. And if the two worlds are collapsing (and given RDFa and microformats even without conneg they are) some of the best practice approaches to developing API URIs need to migrate into best practice approaches for developing document URIs. And some people who want to change document URIs for various reasons still need to be persuaded that persistence matters to punters as well as machines. And I don't think we're there yet.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst
Tue, 09 Nov 2010 13:44:00 -0800 Something I wanted to make, but probably won't http://smethur.st/something-i-wanted-to-make-but-probably-wont http://smethur.st/something-i-wanted-to-make-but-probably-wont

Without wanting to sound terribly old / old fashioned it seems like you can't visit a web page these days without being asked to leave a comment or share your thoughts or speak your brainz. This bothers me for a number of reasons:

  1. Often I would quite like to leave a comment but I don't want to go through the hassle of signing up for an account with the Guardian and the BBC and Fox News and Rapture Ready and...
  2. When you do leave a comment on most of these sites you lose control. Once the submit button is pressed the comment belongs to the publisher; you can't edit it or remove it which is a problem if you're prone to say stupid things when drunk. In some cases, even if you close your account, the comment remains.
  3. On the content licencing front, every time you leave a comment many sites claim the right to "distribute in any fashion over any medium yet invented or yet to be invented" etc. which annoys me.

Obviously you could write a blog post but most of the time it doesn't feel worth it. And obviously there are "social bookmarking" services like delicious where you can leave a note. But both options feel kinda divorced from the content. As in you can't read the comments in the context of the article. Ages ago I thought it would be sweet if you could swivel round your browser window and scribble on the back and see what other people had scribbled. So something like that. Ideally something as low friction as turning down the page of a book or scribbling notes in the margins...

There's also the problem that neither blogs nor social bookmarking services quite integrates with your "social network" so you can't see what your friends have bookmarked / commented on and you can't see their comments on an article you're reading. You can add contacts on delicious eg but does anybody bother with that when their social graph is already described on Twitter / Facebook?

So the thing I'd like to make but probably won't would:

  1. Allow you to sign in via Twitter (asking for read access only!) or not via Twitter
  2. Use Twitter sign ins to access your social graph
  3. Not integrate with Facebook in any way as a pointless point of principle
  4. Have a bookmarklet (cos everything has to have a bookmarklet) which would drop a javascript layer over the page you're on
  5. By default show you and your (Twitter) friends' comments and give you the option to see everyone's comments
  6. Provide a simple mechanism for sites that can't afford the dev time to add users / comments to just include this instead
  7. Let you edit / remove your comments
  8. Allow you to set privacy (private / show to my friends / show to the web) on a per account / per comment basis
  9. Allow you to set licencing of web public comments in a similar fashion to Beerspotr
  10. Provide data views for public comments with a bit of SIOC and a dab of FOAF
  11. Allow you to export your comments to bog standard blog type things and leave the service and take your comments with you
  12. Not use any of that no-follow nonsense

In elevator pitch terms, somewhere between delicious and lanyard (for socialness) with lots of javascript overlays and an ethical ownership policy. And no adverts.

Anyway, the closest I ever got was almost registering dogearz.org. Then realised it would involve both OAuth and JavaScript (both of which scare the bejesus out of me) and gave up. But still think it would be nice to comment and turn down pages and stay out of the databases of people who want to sell me stuff.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/826755/lemon.jpeg http://posterous.com/users/YMTiVFSiIz7 Michael Smethurst fantasticlife Michael Smethurst