A couple of weeks back UK Prime Minister David Cameron announced the broadening of the publicly available government data with the publishing of key data on the National Health Service, schools, criminal courts and transport.
The background to the announcement was a celebration of the preceding year of activity in the areas of transparency and open data, with many core government data sets being published. Too many to list here, but the 7,200+ listed on data.gov.uk gives you an insight. The political guide to this is undeniable, as Mr Cameron makes clear in his YouTube speech for the announcement “Information is power because it allows people to hold the powerful to account”
His “I believe it will also drive economic growth as companies can use this new data to build web sites or apps that allow people to access this information in creative ways” statement also gives an indication of the drivers for the way forward.
To be successful in either of these ambitions, the people and the companies have to have access to information in an easy and reliable way that gives them confidence to build their opinions and their business models upon. What do we measure that ease and reliability against – is it the against the world of audited business practice, where the legal eagles and armchair auditors strive towards perfection, or is it the web world in which a lack of perfection is accepted with the norm of good-enough is good enough? I believe that with government data, on the web, we should still accept that it will not be perfect, but the good-enough hurdle should be set higher than we would expect from the likes of Wikipedia and some other oft used data sources.
There are two mentions, in the words that accompany the announcement, that appear to recognise this. Firstly in the announcement itself on the Number 10 website, this: All of the new datasets will be published in an open standardised format so they can be freely re-used under the Open Government Licence by third parties. What ‘open standardised format’ actually means is something we need to delve in to, but previous data.gov.uk work towards Linked Data and shared reliable identifiers for things [such as postcodes, schools, and stations] bodes well. Secondly in Mr Cameron’s letter to his Cabinet we get a section on improving data quality, including things like plain English descriptions of scope and purpose, introducing unique identifiers to help tracking of interactions with companies, and an action plan for improving quality and comparability of data.
So where are we now? Some of the new data is not perfect, as this thread on the UK Government Data Developers Google Group, shows. William Waltes, identifies that the [government] reporting of transactions with the Open Knowledge Foundation, do not match the transactions in the OKF’s own books, therefore calling in to question how reliable those [government] figures are. In my opinion, this is an example where we should applaud the release of such new data but, with conversations such as the one William started, help those who are publishing the data to improve the quality, reliability and comparability of their output. Of course by definition it means that the publishers are prepared and ready to listen – and are listening.
What we shouldn’t do is throw our hands in the air in despair because the first publishing of data by some departments is not up to what we would expect, or decry the move towards shared [URI based] identifiers because they look confusing in a csv file. Data publishers will get better at it with helpful criticism. I am also convinced that sharing well known reliable identifiers for things across desperate, government and non-government, data will in the medium term have a far greater benefit than most [including enthusiasts for Linked Data like me] can currently envisage.
This post was also published on the Talis Consulting Blog