Last Wednesday was the second BBC Data Day. I didn't manage to make the first one but I did end up chatting afterwards with various BBC, ODI and OU people about the sort of data they'd like to see the BBC release. Shortly afterwards I sketched some wireframes and off the back of that was invited to talk at the second event. Which I also didn't manage to make because I was at home, ill and feeling sorry for myself. In the event Bill stepped in and presented my slides. These are the slides and notes I would have presented if I had managed to be there:

Slide 1


I'd like to talk about open data on the web, what it's for and in particular how it enables transparency to audiences across journalism and programme making.

Slide 2


So why publish open data on the web? Three common reasons are given:

  1. to enable content and service discovery from 3rd parties like Google, Bing, Facebook, Twitter etc. These are things like, Open Graph, Twitter Cards etc used to describe services so 3rd parties can find your stuff and make (pretty) links to it. Which often becomes a very low level form of automated syndication because that's how the web works
  2. to outsource innovation and open up the possibilities of improving your service to 3rd parties. The Facebook strategy of encouraging flowers to bloom around their fields. Then picking the best ones and buying them
  3. and finally because... transparency. To show the world your workings in the best interests of serving the public

Today I'm only really talking about transparency.

Slide 3


So sausages. The BBC already publishes some "open" data but that data only describes the end product, the articles and programmes, and not the process.

Slide 4


This is the Programmes Ontology. It shows the kinds of data we publish about all BBC programmes.

There are programme brands and series and individual episodes and versions of those episodes and broadcasts and iPlayer availabilities. The kind of data you'd need to build a Radio Times or an EPG. Or iPlayer.

Slide 5


And this is the brand page for Panorama. Ask for it as data and you'll get...

Slide 6



Again a brand with episodes with broadcast etc

Slide 7


What's interesting is what isn't there. What goes on in the factory before the sausages make it to the shelves.

Slide 8


Things like:

  1. commissioning decisions. Who? When? Why? What didn't get commissioned?
  2. scheduling decisions
  3. talent decisions
  4. guest decisions
  5. runnings orders. What things / what order?

Who refused to appear? Who refused to put up a spokesperson? What was the gender split of guests? What was the airtime gender split?

A couple of weeks back there was a George Monbiot piece in the Guardian bemoaning the fact that BBC programmes often didn't include enough background information about guests on current affairs programmes. Particularly in respect to connections with lobbyists and lobbying firms.

As a suggestion: every contributor to BBC news programmes should have a page (and data) on listing their appearances and detailing their links to political parties, NGOs, campaigning groups, lobbyists, corporations, trade unions etc.

Slide 9


Away from programmes what would transparency look like for online news.

Slide 10


The Guardian is the most obvious example where clarifications and corrections aren't hidden away but given their own home on the website.

Slide 11


And the articles come with a history panel which doesn't show you what changed but at least indicates when a change has happened.

The Guardian's efforts are good but not as linked together as they might be.

Slide 12


Unlike Wikipedia. This is the edit history of the English Wikipedia article on the 2014 Crimean Crisis. Every change is there together with who made it, when and any discussion that happened around it.

Slide 13


And every edit can be compared with what went before, building a picture of how the article formed over time as new facts emerged and old facts were discounted.

Slide 14


I didn't manage to attend last year's data day but I did end up in the pub afterwards with Bill and some folk from the ODI and the Open University.

We talked about the kind of data we'd all like to see the BBC release and it was all about the process and not the products. The sausage factory and not the sausages.

We made a list of the kinds of data that might be published and it fitted well with how the BBC likes to measure its own activities: Reach, Impact and Value.

Slide 15


It also looked a lot like this infographic which made the rounds of social media last week detailing the cost per user per hour of the BBC TV channels

Slide 16


These were the wireframes I made following last year's pub chat.

They were intended to sit on the "back" of BBC programme pages; side 1 would show the end product, side 2 would be the "making of" DVD extra, the data about the process.

Headline stats for every programme would include total cost, environmental impact, number of viewers / listeners across all platforms and the cost per viewer of that episode.

Programmes would be broken down by gender split of contributors and their speaking time.

Reach would list viewer / listener figures across broadcast, iplayer, downloads and commercial sales.

Slide 17


Impact would list awards, complaints, clarifications, corrections and feedback from across the web.

And value would list production costs, acquisition costs and marketing spend.

All of this would be available as open data for licence fee payers to take, query, recombine, evaluate and comment on.

Having made the wireframes I chatted with Tony Hirst from the OU about how we might prototype something similar. We came up with a rough data model and Tony attempted to gather some data via FOI requests.

Slide 18


Unfortunately they were all refused under the banner of "purposes of journalism, art or literature" which seems to be a catch all category for FOI requests marked "no".

Google has 20 million results for the query "foi literature art journalism", around 10 million of those would seem to relate in some way to the BBC.

The idealist in me would say that, for "the purposes of journalism", in its noblest sense, and the greater good of society, the default position needs to flip from closed to open. The "purposes of journalism", more than any other public service, should not be an escape hatch from open information.

And the public would benefit from "journalism as data" at least as much as from "data journalism".

Photo credits

Packing Carsten's weiner sausages on an assembly line, Tacoma, Washington by Washington University

Sausages at Wurstkuche by Sam Howzit