A New Revolution

A colleague sharing their experience of visiting Ironbridge, promoted as “The Birthplace of the Industrial Revolution” helped clarify some thoughts I have been brewing to help convey where the current Linked Data enthusiasms and initiatives may lead us.

Iron Bridge The famous Iron Bridge, opened in 1781, spans the River Severn in Shropshire, England.  To quote the WikipediaIt was the first arch bridge in the world to be made out of cast iron, a material which was previously far too expensive to use for large structures. However, a new blast furnace nearby lowered the cost and so encouraged local engineers and architects to solve a long-standing problem of a crossing over the river.”  The raw materials of iron ore and coal had been known for a long time, but it took the building of a nearby furnace, using the innovation of coke as a fuel, that enabled the local community to invest in the construction.  The outcome was not only to stimulate the local commercial and administrative economy, but it also became an 18th century tourist attraction, which it continues to be today.

linkeddata_blue All very interesting, but what has this to do with Linked Data and it’s future?

The impact of Linked Data and the Web of Data it enables, on the way we interact and do business, will be greater than that of the World Wide Web that it builds upon.

When one makes statements like that one, you are often asked to justify yourself.  As you may know, I like to use analogies to help clarify things and I believe the Industrial Revolution is a good one in the case of the future for Linked Data and associated techniques.  I am also very aware that analogies tend to fall apart if you pick at the detail too much, so please bear with me on this one.

Like the Industrial Revolution, Linked Data is building on what went before.  Before the Iron Bridge, there were other bridges, roads, and uses of iron.  Before Linked data there was/is the Web – a globally distributed web of linked human-readable web pages, upon which are surfaced words and images for our information, entertainment and commercial desires.  Data of course plays it’s part, often powering the websites that we all consume.

So what is so special about a Web of Data? – The data comes out from behind  those websites to be linked with other data across the web, or maybe an intranet.  Using the same techniques for linking pages together [the URL], data identifiers are given URIs.  This means that a piece of data is given an identifier that is addressable across the web and therefore linkable with other data identified in a similar way.

Curled Wildlife FinderSo where does the Industrial Revolution analogy kick in?  Well, once data are identifiable in a globally distributed context, they can be linked, mixed, mashed, and generally used to add value to each other.  Your data can become the raw material for someone else’s process – your Wikipedia comment about an animal can become the description on a, data powered, BBC page about that species.  As with coal, which after some refinement can become coke to be used to add value to the iron smelting process, any published data can be the raw material for value adding/combining processes.  The processor, utilising their knowledge, skills, and experience to produce an alloy of data, the combination of which is greater than the sum of it’s parts.

blast In the same way that some freely available elements, such as the air pumped in to that blast furnace, were needed to get the process going; freely and openly available data, such as governments and the media are publishing, are priming the pumps of a data revolution.

Whenever there is value to be added in a process there is both community and commercial opportunity.  Once people start using their skill and understanding of a facet of knowledge, to link data from one free, or commercial, source with more free or commercial data they can produce either a saleable result, and/or an enhancement to their own services.  The output of one value-add process can then become one of the sources for yet another, and so on.

To finally stretch my analogy just a little further – looking back to those early days in the Severn Valley, it is possible to identify the building-blocks that led to commercial steel production, the age of steam, the automobile industry, and space flight.  Most of which would have been unthinkable by those early pioneers.  Pre-1994, could we have predicted the growth of Google, YouTube, Wikipeadia, and Twitter?  In 2010 can we identify the building-blocks of a data revolution?  – I think maybe we can.

So how will such a revolution, underpinned by Linked Data, change the way we interact and do business, more fundamentally than the Web has? –  By creating whole new communities and industries to connect, supply, trade, enhance, distribute, interpret, and build services and applications upon a supporting web of globally available data elements and alloys.

This post was also published on the Nodalities Blog

Linked Spending Data – How and Why Bother Pt1

linkedlocalgovNational Government instructing the 300+ UK Local Authorities to publish “New items of local government spending over £500 to be published on a council-by-council basis from January 2011” has had the proponents of both open, and closed, data excited over the last few months.  For this mini series of posts I am working on the assumption that publishing this data is a good thing, because I want to move on and assert that [when publishing] one format/method to make this data available should be Linked Data.

This immediately brings me to the Why Bother? bit. This itself breaks in to two connected questions – Why bother publishing any local authority data as Linked Data? and Why bother using the, unexciting simplistic, spending data as a a place to start?

I believe that spending data is a great place to start, both for publishing local government data and for making such data linked.  Someone at national level was quite astute choosing spending as a starting point.  To comply with the instruction all an authority has to do is produce a file containing five basic elements for each payment transaction: An Id, a date, a category,  a payee, and an amount.  At a very basic level it is very easy to measure if an authority has done that or not.

Guidance from data.gov.uk expands on this a little by mandating the following:

Body This should be the URI that represents (or more properly ‘identifies’ – see below) the local authority at statistics.data.gov.uk.
eg. http://statistics.data.gov.uk/id/local-authority-district/00CN
Date Should ideally be the payment date as recorded in purchase or general ledger
Transaction number To identify within authority’s system, for future reference
Amount In Sterling recorded in finance system
Supplier Details Name and individual authority id for supplier plus where possible Companies House, Charity Registration, or other recognised identifier
Expense Area The part of the authority that spent the amount
Service Categorization

Depending on the accounts system this may be easy or quite difficult. There are two candidates for categorization – CIPFA’s BVACOP classification and the Proclass procurement classification system.

… a little more onerous, possibly around the areas of identifying company numbers and Service Categorization, but not much room for discussion/interpretation.

As to the file formats to publish data, the same advice mandates: The files are to be published in CSV file format – supplemented by – Authorities may wish to publish the data in additional formats as well as the CSV files (e.g. linked data, XML, or PDFs for casual browsers). There is no reason why they should not do this, but this is not a substitute for the CSV files.

So fairy clear, and measurable, then. You either have published your required basic elements of data in a CSV format file, or you have not.  Couple this with the political ambitions and drive behind the Government’s Transparency Agenda, and local authorities will have difficulty in not delivering this.  Although some are being a bit tardy and others seem reticent to publish in formats other than pdf.

OK so why bother with applying Linked Data techniques to this [boring] spending data?  Well, precisely because it is simple data, it is comparatively easy to do, and because everybody is publishing this data the benefits of linking should soon become apparent.   Linked Data is all about identifying things and concepts, giving them a globally addressable identifiers (URIs) and then describing the relationships between them.

For those new to Linked Data, the use of URIs as identifiers often causes confusion.   A URI, such as  http://statistics.data.gov.uk/id/local-authority-district/00CN, is a string of characters that is as much an identifier as the payroll number on your pay-check, or a barcode on a can of beans.  It has couple of attributes that make it different from traditional identifiers.  Firstly, the first part of it is created from the Internet domain name of the organisation that publish the identifier.  This means that it can be globally unique. Theoretically you could have the same payroll number as the the barcode number on my can of beans – adding the domain avoids any possibility of confusion.  Secondly, because the domain is prefixed by http:// it gives the publisher the ability to provide information about the thing identified, using well established web technologies.  In this particular example, http://statistics.data.gov.uk/id/local-authority-district/00CN is the identifier for Birmingham City Council, if you click on it [using it as an internet address] data.gov.uk will supply you information about it – name, location, type of authority etc.

Following this approach, creating URI identifiers for suppliers, categories, and individual payments and defining the relationships between them using the Payments Ontology (more on this when I come on to the How)  leads to a Linked Data representation of the data.  In technical terms a comparatively easy step using scripts etc.

By publishing Linked Spending Data and loading it in to a Linked Data store, as Lichfield DC have done, it becomes possible to query it, to identifies things like all payments for a supplier; or suppliers for a category, etc.

If you then load data for several authorities in to an aggregate store, as we are doing in partnership with LGID, those queries can identify patterns or comparisons across authorities.  Which brings me to ….

linkeddata_blue Why bother publishing any local authority data as Linked Data?  Publishing as Linked Data enables an authority’s data to be meshed with data from other authorities and other sources such as national government.  For example, the data held at statistics.data.gov.uk includes which county an authority is located within.  By using that data as part of a query, it would for instance be possible to identify the total spend, by category, for all authorities in a county such as the West Midlands.

As more authority data sets are published, sharing the same identifiers for authority category etc., they will naturally link together, enabling the natural navigation of the information between council departments, services, costs, suppliers, etc.  Once this step has been taken and the dust settles a bit, this foundation of linked data should become an open data  platform for innovating development and the publishing of other data that will link in with this basic but important financial data.

There are however some more technical issues, URI naming, aggregation, etc.,  to be overcome or at least addressed in the short term to get us to that foundation.  I will cover these in part 2 of this series.

This post was also published on the Nodalities Blog