The Linked Data movement was kicked off in mid 2006 when Tim Berners-Lee published his now famous Linked Data Design Issues document. Many had been promoting the approach of using W3C Semantic Web standards to achieve the effect and benefits, but it was his document and the use of the term Linked Data that crystallised it, gave it focus, and a label.
In 2010 Tim updated his document to include the Linked Open Data 5 Star Scheme to “encourage people — especially government data owners — along the road to good linked data”. The key message was to Open Data. You may have the best RDF encoded and modelled data on the planet, but if it is not associated with an open license, you don’t get even a single star. That emphasis on government data owners is unsurprising as he was at the time, and still is, working with the UK and other governments as they come to terms with the transparency thing.
Once you have cleared the hurdle of being openly licensed (more of this later), your data climbs the steps of Linked Open Data stardom based on how available and therefore useful it is. So:
Available on the web (whatever format) but with an open licence, to be Open Data
Available as machine-readable structured data (e.g. excel instead of image scan of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
All the above, plus: Link your data to other people’s data to provide context
By usefulness I mean how low is the barrier to people using your data for their purposes. The usefulness of 1 star data does not spread much beyond looking at it on a web page. 3 Star data can at least be downloaded, and programmatically worked with to deliver analysis or for specific applications, using non-proprietary tools. Whereas 5 star data is consumable in a standard form, RDF, and contains links to other (4 or 5 star) data out on the web in the same standard consumable form. It is at the 5 star level that the real benefits of Linked Open Data kick in, and why the scheme encourages publishers to strive for the highest rating.
Tim’s scheme is not the only open data star rating scheme in town. There is another one that emerged from the LOD-LAM Summit in San Francisco last summer – fortunately it is complementary and does not compete with his. The draft 4 star classification-scheme for linked open cultural metadata approaches the usefulness issue from a licensing point of view. If you can not use someone’s data because of onerous licensing conditions it is obviously not useful to you.
permission to use the metadata is contingent on providing attribution in a way specified by the provider
metadata can only be combined with data that allows re-distributions under the terms of this license
So when you are addressing opening up your data, you should be asking yourself how useful will it be to those that want to consume and use it. Obviously you would expect me to encourage you to publish your data as ★★★★★ – ★★★★ to make it as technically useful with as few licensing constraints as possible. Many just focus on Tim’s stars, however, if you put yourself in the place of an app or application developer, a one LOD-LAM star dataset is almost unusable whilst still complying with the licence.
So think before you open – put yourself in the consumers’ shoes – publish your data with the stars.
One final though, when you do publish your data, tell your potential viewers, consumers, and users in very simple terms what you are publishing and under what terms. As the UK Government does through data.gov.uk using the Open Government Licence, which I believe is a ★★★.
The Web has been around for getting on for a couple of decades now, and massive industries have grown up around the magic of making it work for you and your organisation. Some of it, it has to be said, can be considered snake-oil. Much of it is the output of some of the best brains on the planet. Where, on the hit parade of technological revolutions to influence mankind, the Web is placed is oft disputed, but it is definitely up there with fire, steam, electricity, computing, and of course the wheel. Similar debates, are and will virtually rage, around the hit parade of web features that will in retrospect have been most influential – pick your favourites, http, XML, REST, Flash, RSS, SVG, the URL, the href, CSS, RDF – the list is a long one.
I have observed a pattern as each of the successful new enhancements to the web have been introduced, and then generally adopted. Firstly there is a disconnect between the proponents of the new approach/technology/feature and the rest of us. The former split their passions between focusing on the detailed application, rules, and syntax of it’s use and; broadcasting it’s worth to the world, not quite understanding why the web masses do not ‘get it’ and adopt it immediately. This phase is then followed by one of post-hype disillusionment from the creators, especially when others start suggesting simplifications to their baby. Also at this time back-room adoption by those who find it interesting, but are not evangelistic about it, starts to occur. The real kick for the web comes from those back-room folks who just use this next thing to deliver stuff and solve problems in a better way. It is the results of their work that the wider world starts to emulate, so that they can keep up with the pack and remain competitive. Soon this new feature is adopted by the majority, because all the big boys are using it, and it becomes just part of the tool kit.
A great example of this was RSS. Not a technological leap but a pragmatic mix of current techniques and technologies mixed in with some lateral thinking and a group of people agreeing to do it in ‘this way’ then sharing it with the world. As you will see from the Wikipedia page on RSS, the syntax wars raged in the early days – I remember it well 0.9, 0.91, 1.0, 1.1, 2.0- 2.01, etc. I also remember trying, not always with success, to convince people around me to use it, because it was so simple. Looking back it is difficult to say exactly when it became mainstream, but this line from Wikipedia gives me a clue: In December 2005, the Microsoft Internet Explorer team and Microsoft Outlook team announced on their blogs that they were adopting the feed icon first used in the Mozilla Firefox browser. In February 2006, Opera Software followed suit. From then on, the majority of consumers of RSS were not aware of what they were using and it became just one of the web technologies you use to get stuff done.
I am now seeing the pattern starting to repeat itself again, with structured and linked data. Many, including me, have been evangelising the benefits of web friendly, structured, linked data for some time now – preaching to a crowd that has been slow in growing, but growing it is. Serious benefit is now being gained by organisations adopting these techniques and technologies, as our selection of case studiesdemonstrate. They are getting on with it, often with our help, using it to deliver stuff. We haven’t hit the mainstream yet. For instance, the SEO folks still need to get their head around the difference between content and data.
Something is stirring around the edge of the Semantic Web/Linked Data community that has the potential to give structured web enabled data the kick towards mainstream that RSS got when Microsoft adopted the RSS logo and all that came with it. That something is schema.org, an initiative backed by the heavyweights of the search engine world, Google, Yahoo, and Bing. For the SEO and web developer folks, schema.org offers a simple attractive proposition – embed some structured data in your html and, via things like Google’s Rich Snippets, we will give you a value added display in our search results. Result, happy web developers with their sites getting improve listing display. Result, lots of structured data starting to be published by people that you would have had an impossible task in convincing that it would be a good idea to publish structured data on the web.
I was at Semtech in San Francisco in June, just after schema.org was launched and caused a bit of a stir. They’ve over simplified the standards that we have been working on for years, dumbing down RDF, diluting the capability, with to small a set of attributes, etc., etc. When you get under the skin of schema.org, you see that with support for RDFa and supporting RDFa 1.1 lite, they are not that far from the RDF/Linked Data community.
Schema.org should be welcomed as an enabler for getting loads more structured and linked data on the web. Is their approach now perfect,? No. Will it influence the development of Linked Data? Yes. Will the introduction be messy? Yes. Is it about more than just rich snippets? Oh yes. Do the webmasters care at the moment? No.
If you want a friendly insight in to what schema.org is about, I suggest a listen to this month’s Semantic Link podcast, with their guest from Google/schema.org Ramanathan V. Guha.
Now where have I seen that name before? – Oh yes, back on the Wikipedia RSS page “The basic idea of restructuring information about websites goes back to as early as 1995, when Ramanathan V. Guha and others in Apple Computer’s Advanced Technology Group developed the Meta Content Framework.” So it probably isn’t just me who is getting a feeling of Déjà vu.
This post was also published on the Talis Consulting Blog