0118-wikipedia-blackout-sopa-blackout_full_600 Well I did for a start!  I chose this auspicious day to move the Data Liberate web site from one hosting provider to another.  The reasons why are a whole other messy story, but I did need some help on the WordPress side of things and [quite rightly in my opinion] they had ‘gone dark’ in support of the SOPA protests.  Frustration, but in a good cause.

Looking at the press coverage from my side of the Atlantic, such as from BBC News, it seems that some in Congress have also started to take notice.  The most fuss in general seemed to be around Wikipedia going dark, demonstrating what the world would be like without the free and easy access to information we have become used to.  All in all I believe the campaign has been surprisingly effective on the visible web.

However, what prompted this post was trying to ascertain how effective it was on the Data Web, which almost by definition is the invisible web.  Ahead of the dark day, a move started on the Semantic Web and Linked Open Data mailing lists to replicate what Wikipedia was doing by going dark on Dbpedia – the Linked Data version of Wikipedia structured information.  The discussion was based around the fact that SOPA would not discriminate between human readable web pages and machine-to-machine data transfer and linking, therefore we [concerned about the free web] should be concerned.  Of that there was little argument.

The main issue was that systems, consuming data that suddenly goes away, would just fail.  This was countered by the assertion that, regardless of the machines in the data pipeline, there will always be a human at the end.  Responsible systems providers, should be aware of the issue and report the error/reason to their consuming humans.

Some suggested that instead of delivering the expected data, systems [operated by those that are] protesting, should provide data explaining the issue.  How many application developers have taken this circumstance in to account in their design I wonder.  If you, as a human, are accessing a SPARQL endpoint, are presented with a ‘dark’ page, you can understand and come back to query tomorrow.  If you are a system getting different types of, or no, data back, you will see an error.

The question I have is, who using systems that use Linked Data [that went dark] noticed that there was either a problem, or preferably an effect of the protest?

I suspect the answer is very few, but I would like to hear the experiences of others on this.

Comment   or   Contact us

The Linked Data movement was kicked off in mid 2006 when Tim Berners-Lee published his now famous Linked Data Design Issues document.  Many had been promoting the approach of using W3C Semantic Web standards to achieve the effect and benefits, but it was his document and the use of the term Linked Data that crystallised it, gave it focus, and a label.

mug-300x300 In 2010 Tim updated his document to include the Linked Open Data 5 Star Scheme to “encourage people — especially government data owners — along the road to good linked data”. The key message was to Open Data.  You may have the best RDF encoded and modelled data on the planet, but if it is not associated with an open license, you don’t get even a single star.  That emphasis on government data owners is unsurprising as he was at the time, and still is, working with the UK and other governments as they come to terms with the transparency thing.

Once you have cleared the hurdle of being openly licensed (more of this later), your data climbs the steps of Linked Open Data stardom based on how available and therefore useful it is. So:

Available on the web (whatever format) but with an open licence, to be Open Data
★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★ All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★ All the above, plus: Link your data to other people’s data to provide context

By usefulness I mean how low is the barrier to people using your data for their purposes.  The usefulness of 1 star data does not spread much beyond looking at it on a web page.  3 Star data can at least be downloaded, and programmatically worked with to deliver analysis or for specific applications, using non-proprietary tools.  Whereas 5 star data is consumable in a standard form, RDF, and contains links to other (4 or 5 star) data out on the web in the same standard consumable form.  It is at the 5 star level that the real benefits of Linked Open Data kick in, and why the scheme encourages publishers to strive for the highest rating.

Tim’s scheme is not the only open data star rating scheme in town.  There is another one that emerged from the LOD-LAM Summit in San Francisco last summer – fortunately it is complementary and does not compete with his.  The draft 4 star classification-scheme for linked open cultural metadata approaches the usefulness issue from a licensing point of view.  If you can not use someone’s data because of onerous licensing conditions it is obviously not useful to you.

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is not contingent on anything
  • metadata can be combined with any other metadata set (including closed metadata sets)
★★★ Attribution License (CC-BY / ODC-BY) when the licensor considers linkbacks to meet the attribution requirement

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution by linkback to the data source
  • metadata can be combined with any other metadata set, including closed metadata sets, as long as the attribution link is retained
★★ Attribution License (CC-BY / ODC-BY) with another form of attribution

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution in a way specified by the provider
  • metadata can be combined with any other metadata set (including closed metadata sets)
Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution in a way specified by the provider
  • metadata can only be combined with data that allows re-distributions under the terms of this license

So when you are addressing opening up your data, you should be asking yourself how useful will it be to those that want to consume and use it.  Obviously you would expect me to encourage you to publish your data as ★★★★★★★★★ to make it as technically useful with as few licensing constraints as possible.  Many just focus on Tim’s stars, however, if you put yourself in the place of an app or application developer, a one LOD-LAM star dataset is almost unusable whilst still complying with the licence.

So think before you open – put yourself in the consumers’ shoes – publish your data with the stars.

One final though, when you do publish your data, tell your potential viewers, consumers, and users in very simple terms what you are publishing and under what terms. As the UK Government does through data.gov.uk using the Open Government Licence, which I believe is a ★★★.

Comment   or   Contact us

stanfordreportlogo Although there has been a half year lag between the the workshop held at Stanford University, at the end of June 2011, and the Stanford Linked Data Workshop Technology Plan [pdf] published on December 31st, the folks behind it obviously have not been twiddling their thumbs.  The 44 pages constitute a significant well thought through proposal.   There is always benefit in shooting high when making a plan – from the introduction:

This is a plan for a multi-national, multi-institutional discovery environment built on Linked Open Data principles. If instantiated at several institutions, will demonstrate to end users the value of the Linked Data approach to recording machine operable facts about the products of teaching, learning, and research.

…The resulting discovery environments will demonstrate the dramatic change that is possible in the academic information resource discovery environment when organizations move beyond closed and rule-bound metadata creation and utilization.

…This model also postulates dramatic changes to the creation, adoption, editing, and maintenance of metadata records for bibliographic holdings as well as scholarly information resources licensed for use in research institutions;

schema-org1 Refreshingly different for an academic report on proposed academic processes, the authors seem to be shying away from some of the traditional institutionally focused or unwieldy and elaborate coordination mechanisms.  Their basic premise is to deliver a Linked Data model that is adopted by schema.org.  Building on the role that schema.org already plays with the models/schema it already supports. Such a model would not only be easily referenced by those in the worlds of libraries and academia, but more generally across the data web and by users of other schema.org schema.  An obvious example that immediately springs to mind would be an academic publisher wishing to intermix globally recognised metadata formatted data about their products with, equally globally recognised sales offer information.

Moving on beyond the introduction, the report starts by setting some goals:

  • Implement an information ecosystem that exploits Linked Data’s ability to record and make discoverable an ongoing, richly detailed history of the intellectual activity embodied in all of a research university’s academic endeavors and its use of library resources and programs.
  • Design and implement data models, processes, workflows, applications, and delivery
    services….
  • Construct an ecosystem based on linked-data principles that draws on the intellectual
    activity and resources found throughout a research university’s programs and its libraries. Use structured, curated representations of these activities and resources to populate a graph of named links.

With a scope of a “… model comprises the pursuits of a research university’s faculty and students. Included in that scope are the knowledge and information resources that a research university creates, acquires, and uses in the course of its scholarship, research, and teaching programs.” – which kind of includes most everything we do – they are not playing at this.

For many in the world of libraries and associated domains, Linked Data may seem to be just the latest brand of technological snake-oil.  A brand that not only promises to add value, but to radically disrupt the way they do things.  I obviously agree with that (except the snake-oil bit) but know from experience it is not an easy sell to the sceptical.  The authors of the report approach this difficulty by referencing several examples and initiatives.

One of the core things they reference is close to my heart, having been closely involved with it with former colleagues at Talis Consulting – The British Library data model, which they used to openly publish the British National Bibliography as Linked Data.  They intend to use this model as a starting point for their work.

Doing so will ensure that the resulting model retains the BL’s high-level focus and its webderived, transparent structure for representing facts about people, organizations, places, events, and topics. Such focus represents a marked contrast to efforts based on all-inclusive models that enforce highly structured, deeply detailed and therefore exceedingly brittle representations of physical and digital objects..

I could go on picking out excellent examples and references from the report, such as LinkSailor, the recent proposal from the Library of Congress to transition to A Bibliographic Framework for the Digital Age, the vote by European libraries to support an open data policy for their bibliographic records, Talis’ Kasabi.com Linked Data powered data marketplace and Linked Data scholarly resource system Talis Aspire, Drupal’s use of RDF & Linked Data techniques, aligning with Schema.org, Google’s Freebase, etc., but I would recommend reading the report yourself as they place these things in context.

Reading it through a couple of times has left me with a couple of strongly held hopes.

Hope 1.  This report gains traction and attracts funding.  Implementation of an exemplar ecosystem for the publishing and linking of intellectual information, such as this, will be a massive boost towards the realisation [both intellectually and operationally] of the benefits of applying Linked Data techniques and technologies in the scholarly and research domains.

Hope 2.  They remain true to the ambition to “retain the BL’s high-level focus and its webderived, transparent structure for representing facts about people, organizations, places, events, and topics”.  It would be so easy to fall back in to the over-engineered minutiae with over emphasis on edge-cases and only focussed on internal domain concerns, approach to data publication that has characterised the bibliographic world for the last few decades.

Linked Data, and the way this report approaches it’s adoption, has the potential to make the world’s information accessible to all that can benefit.  To get us there requires honest evangelism and demonstrations of practical benefits, but mostly being true to your goals for implementing it.  I welcome this report and pass on my hopes for it’s proposals becoming a reality.

Comment   or   Contact us

A New Beginning

It is great to launch a new venture, and I am looking forward to launching this one – Data Liberate.

Having said that, there is much continuity in this step.  Those that know me from the conference circuit, my work with the Talis Group and more recently with Talis Consulting, will recognise much  in the core of what Data Liberate has to offer.

What will I be doing at Data Liberate?  The simple answer is much of the same, but with a wider audience & client base and less restricted to specifically Linked Data and Semantic Web techniques and technologies.   Extracting value from data, financial or otherwise, is at the core of the next the next big wave of innovation on the web.   The Web has become central to what we all do commercially, corporately, socially and as individuals.  This, data driven, next wave of innovation will therefore influence, and potentially benefit, us all.

As with all ‘new waves of innovation’ there is much technological gobbledegook, buzzwords and marketing hype surrounding it.  My driving focus for many years has been to make new stuff understandable to those that can benefit.  A focus I intend to continue and increase over the coming months and years.  This is not just a one way process.  Experience has shown me that those with the technology are often not very adept at promoting it’s value to others in terms that they can connect with.  I therefore intend to spend some of my efforts with these technology providers, helping them get their message over for the benefit of all.

Trawling the blogosphere you will find proponents of Open Data, Big Data, Linked Data, Linked Open Data, Enterprise Data and Cloud Computing.  All trying to convince us that their core technology is the key to the future.  As with anything the future, for extracting and benefiting from the value within data, is a mixture of most if not all of the above.   Hence Data Liberate’s focus is on Data in all it’s forms.

Contact me if you want to talk through what Data Liberate will be doing and how we can help you.

Comment   or   Contact us