Privacy, groups and graphs

A couple of days ago a friend sent me a link to The Real Life Social Network. It's a presentation by Paul Adams, the Senior User Experience Researcher at Google, touching on offline social groups, online social groups, strong and weak ties, contextual personaes and privacy. The privacy bit in particular reminded me of danah boyd's rather wonderful Privacy and Publicity in the Context of Big Data paper, echoing the idea that privacy isn't a matter of how much or how little you share, but of how much you understand (and can control) the context of sharing. So the usual problem that a status update intended for three or four people gets seen by three or four hundred and has the potential to be seen by three or four million.

The presentation illustrates the problem using the real life example of Debbie. Offline she has four distinct groups of "friends", online they're all bundled into one big bucket. Which makes it problematic when she wants to share something with a subset of people in that bucket. It's a common problem but how does it get solved? Slide 84 of the presentation says:

Allow people to create custom names for groups, and allow people to rename the group if it changes over time.

Allowing people to create groups and share only with people inside a group is the obvious way to solve the contextual sharing problem. And the Flickr friend / family split is an obvious example. But online groups have problems. Someone has to decide who's in the group and who's excluded. And if the name of the group can be changed someone has to have control over editing it. It all comes down to who controls the group. I could create an online group of school friends: Alice, Bob and Chris. But what if Alice hates Bob or Bob hates Chris or Chris hates me but is too polite to say?

Which I guess is my problem with Twitter lists. Anyone on Twitter can create a list and add anyone else to it. And give the list a descriptive title to make a claim that this group of people can be described by this label. But the people involved can't make a counter claim and have no control over whether they're on the list / in the group or not. Lots of people add the usual disclaimer to blogs and Twitter profiles that these are my views and don't reflect the views of my employer only to be added to some list bearing the employer's name by someone they may or may not know.

My other problem with Twitter lists is they make the context of sharing more opaque. Someone might follow a list you're on without directly following you and since you never get a following email you tend to forget they're there. And the same goes for retweets. But that's a side issue.

So if user defined groups don't work (and I don't think they do) what other mechanisms are there to share with a subset of people from a big bucket of 'friends'? Rather than categorise people into groups you could try categorising your relationships with people. Which is the XFN approach. But again I find XFN a bit creepy. It's easy to say you've met someone or they're a colleague or sibling or a spouse but then you hit the 'friend' word again and have to decide on friend, acquaintance or contact. Which is rarely an easy decision. It's a bit like making a new friend on Facebook and being faced by a ten drop-down choice of how you know them and where you met. It all just adds friction. And whilst I have no data about this, I'd bet that a lot of people lie. Because who wants Facebook to know you met someone in Greece, shared a house for a while, were engaged for 6 months but parted amicably when all you want to do is post on someone's wall.

I think there's a wider problem here. Both grouping friends and categorising relationships are ways to express your world view onto sets of other people. There's a lot of talk about social graphs but every time you see a social network diagram it has one person in the middle (you, the author?) with links out to some other people and possibly some links out from them to a few more. But it doesn't look like a graph. Or if it does look like a graph it looks like a very egocentric one. So to solve the problem of sharing in context maybe we need to step outside / above the graph and think less about how we connect to other people and more about how other people connect to other people. Because a group isn't defined by how I identify or label it but by the density of the interconnections within it.

Chatting with Tom about this today he pointed out that lots of social network sites use the wider graph of connections to recommend new people to connect to. So if I don't know Dave but I do know Alice and Bob and both Alice and Bob know Dave the service will recommend I connect to / follow / friend / whatever Dave. LinkedIn does this to recommend new contacts and Facebook does this to recommend new friends. And I think Twitter did do this to recommend new followees. But since new Twitter came along I can't find it anymore. (That said I can't find anything in new Twitter and it just keeps telling me I'm a bit like Tom which is reassuring but not informative.)

So this made me wonder if anyone was using the wider graph (beyond who you're linked to) to give a better sense of the context of sharing. When it struck me (somewhat late admittedly) that this is exactly what Twitter did when they changed how replies work. I have no idea why they chose to change this and whether it was about making context a little more explicit or just about managing server load but...

..in the old days when you posted a status update to Twitter anyone who followed you (and you hadn't blocked) could see it no matter what the content. Sometime ?this year? Twitter changed how this worked. If the tweet started with @alice only people that followed you and Alice would see it in their timeline. At the time lots of people (including me) complained but in retrospect it feels like a good way to make sharing contextual to a group. At least if you define a group by it's interconnectedness and not by your own definition. It changed the display logic from a line (x follows y) into a triangle (x follows y and z, and y spoke to z). Which isn't exactly a rich graph but is at least not a line.

So maybe other services already do this and make context as a product of interconnectedness more explicit but I can't think of any. And I wonder if it could be expanded further. At the moment it's only possible to separate out one person (the repliee) from any other people mentioned. If it were possible via better annotations to separate people who were the subject of a tweet from the people who the tweet was (primarily) aimed at you could restrict the context to only people mentioned and people who follow you and all the intended recipients. Would that work? Or just results in lots of #reallyfixreplies?

Locomotive numbers, train spotting and QR codes

This was originally written as a bit of a help page for @locospotr, a Twitter based train spotting site that was supposed to be a sister service to BeerSpotr and the sadly neglected CCTVSpotr. Unfortunately LocoSpotr never went live because I never quite finished the code and couldn't find a friendly designer to make the CSS.

So I forgot about this until Paul Carvill tweeted today:

from the Wall of my Facebook group for people on trains: "how about a QR code scanner that you can use to find out which train you're on?!"

Which made me think how cool QR Code trainspotting would be. Our duly elected (ish) leaders would no longer have to worry about impressionable british teenagers hunkering down over youpron. Instead they'd all be out sporting smart tanks tops and twonkPhones on the platforms of Clapham Junction eagerly snapping the QR Codes on the sides of trains.

All it would take is the various train operating companies giving all their locos (and units and classes) cool uris, some stylish Duncan Robertson QR Code stuff and a twonkPhone app. And we could save our nation's youth from death by decadence and wanking. Or maybe not. In the meantime:

Locomotive numbering

In order to identify and organise locomotives, railway companies usually give each one a number. These numbers are usually unique within the confines of the railway system and period. But they are not globally unique and not unique across time. Two locomotives on two different railway systems might share the same number. And a single locomotive might have many numbers over time. The Flying Scotsman, for example, has carried four numbers over its lifetime - 1472, renumbered 4472, renumbered 103, renumbered 60103.

UK locomotive numbering post 1973 - TOPS

In 1973 British Rail adopted the Total Operations Processing System (TOPS). This system gave each locomotive a unique number comprised of 5 (sometimes 6) digits. TOPS survived the breakup of British Rail and is still in use today. If you're in the UK and interested in locomotive spotting this is the number you want.

The trouble with 'multiple units'

Most modern passenger trains are comprised of either DMUs or EMUs. A multiple unit is basically a passenger train without a separate locomotive - every vehicle provides at least some passenger accommodation. They range from 2 coach local units which are little more than buses on rails to fast intercity units like the Virgin Pendolino.

Under TOPS, locomotives, carriages and units all have identifying numbers. Carriage numbering is much more confusing so I won't go into details here. In a locomotive hauled train there's little chance of confusion because the locomotive is obviously separate to the carriages. But in the case of multiple units the unit and all its individual carriages will have identifying numbers. Usually the carriage numbers are shown along the side of each carriage and the unit number is shown on either end. But the position of the unit number can vary between train operators.

For the purposes of LocoSpotr the interesting number is the unit number which is comprised of 6 digits (eg 444040). If you're interested in spotting carriages this probably isn't the site for you but if you spot a carriage that's particularly fascinating you can always add its number to your spot as a Twitter hashtag.

Finally multiple multiple units are often coupled together to form a single train. If you walk down a train and a see a driver cab area, that's the start of a new unit. Each unit has a number. The train as a whole doesn't - or not one you can easily find out. You might want to tweet a separate spot for each unit in the train - you might not.

A note on British Rail's HST - aka Intercity 125

Back in the 1970s British Rail introduced the Intercity 125 HST. Plenty are still running today. They're slightly unusual in that they're passenger units (so have carriage and unit numbers) but are hauled by 2 class 43 locomotives (one at each end). For the purposes of LocoSpotr the interesting numbers are the locomotive numbers (43xxx).

Locomotives with names

Some locomotives have names as well as numbers. Back in steam days the names were usually more persistent than the numbers (eg the Flying Scotsman changed its number 3 times but never changed its name). These days numbers tend to be more persistent than names (eg Pendolino 390010 was originally named Commonwealth Games 2002, renamed Chris Green, then A Decade of Progress).

It's probably better to identify the locomotive by its number than it's name. You can always add the name as a separate hashtag:

@locospotr loco:390010 class:390 #adecadeofprogress

Spotting preserved locomotives

These days there are lots of preserved locomotives running on heritage railways. The majority predate the TOPS system and many predate British Rail. In the days before BR, locomotive numbering was much more fiddly. Each railway company had its own numbering system and many of these systems overlapped. When UK railways were nationalised the locomotives BR inherited were renumbered (this happened a few times before TOPS was introduced).

Many preserved locomotives were either withdrawn before TOPS happened or have been restored to their pre-TOPS livery / numbering for the purposes of nostalgia. There's a lot of debate amongst blokes called Trevor about whether a locomotive that's been technically altered since it first carried a certain livery / number should carry that livery / number in preservation. Which we'll skip over for this intro.

The obvious question is, having spotted a preserved locomotive, which number should you record it under? For the purposes of LocoSpotr it's probably best to record it under the number it carried when you spotted it.

Non-UK locomotives

There's not much to say here cos:

  • I don't really know all that much about forun trains
  • I'm not sure that train spotting is an obsession that extends much outside of 1950s Britain

Sorry!

Locomotive classes

Having gone to the trouble of designing a locomotive it's very rare for only one example to be built. Usually many locomotives are built to the same design. The set of locomotives built to a single design is called a class. Often there'll be minor variations in design and build between locomotives in a class; these variations are usually referred to as a sub-class. But in general all locomotives belonging to a class will be recognisable as part of the same family. Think of locomotive classes like car models: there might be a diesel Ford Focus, a petrol Ford Focus, a 3-door Ford Focus, a 5-door Ford Focus but they're all recognisable as the same basic model.

Again railway companies usually give each class a number which is usually unique within the confines of the railway system and period. But again they are not globally unique and not unique across time. Two different classes on two different railway systems might share the same number. And occasionally class numbers are reused over time; there have been 2 completely different class 70s under the BR TOPS system for example. Tut.

UK class numbering post 1973 - TOPS

When British Rail adopted TOPS each locomotive class was assigned a unique number comprised of 2 or 3 digits. This system is still in use today. Diesel locomotives fall into classes 01-69, DC electric locomotives 70-79, AC electric locomotives 80-96, departmental locos (those not in revenue-earning use) 97, and steam locomotives 98. Diesel multiple units (DMUs) with mechanical or hydraulic transmission are classified 100-199, with electric transmission 200-299. Electric multiple units (EMUs) are given the subsequent classes; 300-399 are overhead AC units, while Southern Region DC third rail EMUs are 400-499, other DC EMUs 500-599.

Luckily for train spotters the TOPS class number is incorporated into the TOPS locomotive number as the leading 2 or 3 characters. To get the class number just take the locomotive number and remove the final 3 characters. So a locomotive with the number 66713 is a member of class 66, a locomotive with the number 390010 is a member of class 390 and a multiple unit with the number 444040 is a member of class 444.

To make things simpler still there's a useful pictorial guide to some of the common BR TOPS classes at Wikimedia Commons.

Classes with names and preserved locomotives

Some locomotive classes have names as well as numbers. Sometimes these are officially sanctioned, sometimes less so. Back in steam days names were often used in preference to numbers (e.g. King class, Castle class, Merchant Navy class).

For the purposes of LocoSpotr if the locomotive class has ever had a TOPS number (even if the locomotive hasn't) it's probably better to use that. You can always add the class name as a separate hashtag:

@locospotr loco:d1005 class:52 #western

If the locomotive class pre-dates TOPS and is known better by its name then use that:

@locospotr loco:35005 class:merchantnavy

Many locomotive classes that pre-date TOPS never earned anything but a nickname. Usually these classes were given alphanumeric labels that sometimes reflected their power classification, sometimes reflected their primary use (e.g. a class 9f was a powerful (9) freight (f) locomotive) and sometimes reflected nothing at all. In these cases just use that label:

@locospotr loco:4472 class:a3 #flyingscotsman

For @edsu

DBpedia is often described as a Linked Data hub. Its about page even describes it as a Nucleus for the Web of Data. A glance at the LOD cloud diagram shows DBpedia front and centre with lots of datasets pointing in.

The implication is if you have some book data and I have some book data, we can both link to DBpedia as a central identifier hub and it becomes possible to triangulate between our datasets. Which is fine. At least if the identifiers don't wobble.

But I wonder what happens if you have some book data and I have some music data or you have some health data and I have some nutrition data etc. DBpedia has data from lots of different domains. By showing it as a single bubble on the LOD cloud it's implied that it's internally coherent, heavily interlinked and homogenous. I wonder if that's true.

Lots of people have looked at the link density within Wikipedia but DBpedia relationships are only the subset it's possible to auto extract (mainly infoboxes and categories). And most of the links within Wikipedia are inline in the article text. So how interlinked is that subset of Wikipedia links. Would it be possible to follow your nose across domains from punk to the Sex Pistols to Jamie Reid to the Situationists like you can in Wikipedia for example? Or not?

Or if you made a diagram similar to the LOD one but only depicting link density inside DBpedia would it look like an homogenous data space or more like a microcosm of the LOD cloud with lots of sparsely connected islands. Is it a hub or a set of hubs that happens to be extracted from the same source and hosted in the same place? If it's the latter it kinda implies that we don't have one Linked Data cloud but several?!?