Linked Spending Data – How and Why Bother Pt1

linkedlocalgovNational Government instructing the 300+ UK Local Authorities to publish “New items of local government spending over £500 to be published on a council-by-council basis from January 2011” has had the proponents of both open, and closed, data excited over the last few months.  For this mini series of posts I am working on the assumption that publishing this data is a good thing, because I want to move on and assert that [when publishing] one format/method to make this data available should be Linked Data.

This immediately brings me to the Why Bother? bit. This itself breaks in to two connected questions – Why bother publishing any local authority data as Linked Data? and Why bother using the, unexciting simplistic, spending data as a a place to start?

I believe that spending data is a great place to start, both for publishing local government data and for making such data linked.  Someone at national level was quite astute choosing spending as a starting point.  To comply with the instruction all an authority has to do is produce a file containing five basic elements for each payment transaction: An Id, a date, a category,  a payee, and an amount.  At a very basic level it is very easy to measure if an authority has done that or not.

Guidance from data.gov.uk expands on this a little by mandating the following:

Body This should be the URI that represents (or more properly ‘identifies’ – see below) the local authority at statistics.data.gov.uk.
eg. http://statistics.data.gov.uk/id/local-authority-district/00CN
Date Should ideally be the payment date as recorded in purchase or general ledger
Transaction number To identify within authority’s system, for future reference
Amount In Sterling recorded in finance system
Supplier Details Name and individual authority id for supplier plus where possible Companies House, Charity Registration, or other recognised identifier
Expense Area The part of the authority that spent the amount
Service Categorization

Depending on the accounts system this may be easy or quite difficult. There are two candidates for categorization – CIPFA’s BVACOP classification and the Proclass procurement classification system.

… a little more onerous, possibly around the areas of identifying company numbers and Service Categorization, but not much room for discussion/interpretation.

As to the file formats to publish data, the same advice mandates: The files are to be published in CSV file format – supplemented by – Authorities may wish to publish the data in additional formats as well as the CSV files (e.g. linked data, XML, or PDFs for casual browsers). There is no reason why they should not do this, but this is not a substitute for the CSV files.

So fairy clear, and measurable, then. You either have published your required basic elements of data in a CSV format file, or you have not.  Couple this with the political ambitions and drive behind the Government’s Transparency Agenda, and local authorities will have difficulty in not delivering this.  Although some are being a bit tardy and others seem reticent to publish in formats other than pdf.

OK so why bother with applying Linked Data techniques to this [boring] spending data?  Well, precisely because it is simple data, it is comparatively easy to do, and because everybody is publishing this data the benefits of linking should soon become apparent.   Linked Data is all about identifying things and concepts, giving them a globally addressable identifiers (URIs) and then describing the relationships between them.

For those new to Linked Data, the use of URIs as identifiers often causes confusion.   A URI, such as  http://statistics.data.gov.uk/id/local-authority-district/00CN, is a string of characters that is as much an identifier as the payroll number on your pay-check, or a barcode on a can of beans.  It has couple of attributes that make it different from traditional identifiers.  Firstly, the first part of it is created from the Internet domain name of the organisation that publish the identifier.  This means that it can be globally unique. Theoretically you could have the same payroll number as the the barcode number on my can of beans – adding the domain avoids any possibility of confusion.  Secondly, because the domain is prefixed by http:// it gives the publisher the ability to provide information about the thing identified, using well established web technologies.  In this particular example, http://statistics.data.gov.uk/id/local-authority-district/00CN is the identifier for Birmingham City Council, if you click on it [using it as an internet address] data.gov.uk will supply you information about it – name, location, type of authority etc.

Following this approach, creating URI identifiers for suppliers, categories, and individual payments and defining the relationships between them using the Payments Ontology (more on this when I come on to the How)  leads to a Linked Data representation of the data.  In technical terms a comparatively easy step using scripts etc.

By publishing Linked Spending Data and loading it in to a Linked Data store, as Lichfield DC have done, it becomes possible to query it, to identifies things like all payments for a supplier; or suppliers for a category, etc.

If you then load data for several authorities in to an aggregate store, as we are doing in partnership with LGID, those queries can identify patterns or comparisons across authorities.  Which brings me to ….

linkeddata_blue Why bother publishing any local authority data as Linked Data?  Publishing as Linked Data enables an authority’s data to be meshed with data from other authorities and other sources such as national government.  For example, the data held at statistics.data.gov.uk includes which county an authority is located within.  By using that data as part of a query, it would for instance be possible to identify the total spend, by category, for all authorities in a county such as the West Midlands.

As more authority data sets are published, sharing the same identifiers for authority category etc., they will naturally link together, enabling the natural navigation of the information between council departments, services, costs, suppliers, etc.  Once this step has been taken and the dust settles a bit, this foundation of linked data should become an open data  platform for innovating development and the publishing of other data that will link in with this basic but important financial data.

There are however some more technical issues, URI naming, aggregation, etc.,  to be overcome or at least addressed in the short term to get us to that foundation.  I will cover these in part 2 of this series.

This post was also published on the Nodalities Blog
Comment below or Contact us