Late last year we published a first draft of a domain model for the UK Parliament. There were two common reactions:
- Shit, that looks complicated
- Why are you biting all this off in one go? (see tweet from Frankie)
In fairness the whole thing did look like a misguided foray into accidental enterprise architecture. And probably gave the impression we were embarking on a journey to map 300 years of parliamentary procedure, before anyone even thought about business systems and data flows and whatever users might want of any of this. Which, honestly, was never what we were trying to do.
We'd spent the previous few months trying to reverse engineer some version of sanity from data.parliament.uk by looking at the instance data, and trying to work out what it was trying to describe. And we found that the data Parliament produced didn't really describe Parliament in a way that made sense to anyone (including those inside Parliament).
The resulting pictures looked pretty much like the pictures of most organisations with any degree of history and complexity. Individual office functions had been digitised without any overarching plan for how the whole thing clipped together. If you squint long enough at the picture of data.parliament, you'll probably see blobs that look a bit like the table office or the journal office or the committee corridors but no real sense of how it hangs together.
So the draft domain model wasn't intended to be a plan of everything with an implied instruction to build. It was a first attempt at trying to explain (perhaps just to ourselves) how in theory the whole thing should hang together. Both Silver and I were fairly new to parliamentary stuff (though Anya had served some time) so getting a rough map of the territory before picking off details felt a reasonable thing to do. It was never supposed to be complete or completable. Just an exercise in orientation.
Since then we've been attempting to pick off tiny bits of the model and zoom in to some of the detail. The big picture (forgive me) has proved useful in giving a sense of where we are and which surrounding things we need to consider. So far we've tackled membership of houses, some library indexing things, a general sense of agency (people and groups of people), large scale parliamentary time periods, parliamentary elections and a few bits and bobs around online petitions. We're currently diving into select committees and hoping that the Commons and the Lords might possibly be able to agree what they mean by that. All this is probably less than five percent of everything.
We like to think it's not slow progress but progress at the right pace (slow agile should be a thing). All of this needs feedback loops and if we get too far ahead of actual development and actual code we'll definitely descend into enterprise architecture vapour model hell. At the risk of repeating, all design starts when real data meets real content meets real users with real software and real connections on real devices. Until then, all design is vanity.
Progress so far is published on GitHub. Hopefully the models are fairly self-explanatory, but to understand how they stitch together you probably need to know a little about RDF. So...
A little about RDF (or skip)
I know RDF and triple stores are not everyone's cup of tea (I'm never entirely sure they're mine). But they do have some properties that make designing small, self-contained data models easier to work with and combine.
The basic building block of RDF is the triple. As the name might suggest, each triple is made of three parts: subject, predicate, object. Like:
Jacob Rees-Mogg > whippedTo > Conservative Party
The possible structures described by the RDF are defined in a thing called an ontology. Subjects and objects are defined in the usual class structure fashion: a Member of Parliament is a subclass of person. Predicates are defined by the set of classes of things they can point from (the domain of the predicate) and the set of classes of things they can point to (the range of the predicate). So the domain of the predicate whippedTo might be a Member of Parliament (or just a person) and the range might be a political party.
RDF ontologies are a little like relational database schemas in so much as they're almost absolutely nothing like database schemas. The ontology does define the bounds of the possible, but unlike a database schema, the ontology doesn't act as a constraint on the data. It acts instead as another set of claims about the data. If the ontology were to define the whippedTo predicate as having a range of political party, and you used that predicate to point to a thing that was not a political party, you wouldn't get a validation error. Instead the thing that was not a political party would be inferred to be a political party because that's what the ontology says the range of the predicate is.
No matter how well you model your ontology, if the instance data isn't carefully loaded then the whole business of RDF data gathering claims from the ontology can have all kinds of unexpected and fairly disastrous side effects. One misused predicate and all your towns become administrative areas, all your administrative areas become towns; all your dogs become cats, all your cats become dogs; all your living people become dead, and all your dead people come back to life.
But it does have one advantage in that you don't have to specify all the potential relationships between classes in the ontology. Just by using a predicate with a range of X, the thing you use the predicate on automatically gets typed as being of class X. In the Parliament work to date we have a tiny model for contact points. It defines phone numbers, fax numbers, email addresses, contact form URLs, postal addresses etc. It also defines a predicate called hasContactPoint, with a domain of ContactableThing and a range of ContactPoint. We can choose to use this predicate on anything so we might have instance data representing Jacob Rees-Mogg who's declared to be in the class of People. We don't have to define People as being a subclass of ContactableThings, we can just use the predicate hasContactPoint and Jacob immediately becomes both a Person and a ContactableThing. So class structure is emergent from use (which would also make a good tshirt if anyone's printing).
Some advantages of fag packet data models
There are a whole bunch of reasons you might want to keep data models small and loosely coupled:
- Proper domain driven design people would talk about bounded contexts. There's better writing than this by Martin Fowler on why you need bounded contexts particularly in complex organisations, particularly where language is slippery outside of context (looking at you, Parliament).
- Chunked up data models make concentrating on the problem area easier. You can close your mind to the bits that aren't pertinent whilst still maintaining a bigger picture.
- Tiny data models are easier to work with. Or at least easier to switch between contexts. You can be working in one area and suddenly realise you have to add something to a separate model. Opening up a massive ERD takes time to orientate. Opening up something the size of a napkin means you can digest almost at a glance, make the change you need to make and get out.
- Tiny data models are easier to share because they're easier for the people you're sharing with to understand. Recently we did some work on a parliamentary time period ontology which we pushed to GitHub and tweeted. Which got retweeted by Tony and picked up by Chris who spotted almost immediately that we'd defined the reign of a monarch to be a time period (when it's probably more of an event). Chris replied and we fixed it. Similar with Gefion's comments on our election model. She pointed out that it wouldn't quite work outside the UK which wouldn't really help with our standardisation efforts. We took a second look and realised that it wouldn't actually work for the UK either for many of the same reasons. So that was good to fix. Would these things have been spotted and corrected as easily if they were fragments of models floating in a soup of everything else? Possibly, but probably not so quickly.
- Tiny data models are easy to draw and redraw and talk over. Whether on a whiteboard or a napkin or an actual fag packet. You can quite quickly sketch the bits you're interested in, introduce new people to it, talk it through and correct.
- Tiny, agile, loosely coupled data models make it easier for multiple small, agile, loosely coupled teams to work in roughly the same area without treading on each other's toes
- Unproved for now but at least in theory, tiny data models help with standardisation. It's unlikely all legislatures will agree on all things, but splitting the data model into bounded contexts makes it easier to pick off bits you can agree on and bits that are specific to Parliament.
- As someone once said (possibly me), if it won't fit on a fag packet, it probably can't be built.
Anyway, the data models so far are here. We still need to write some notes to explain a few of the design decisions we've made. Hopefully we'll do that soon. Comments, pull requests, conversations welcome as ever.