I have been reading with interest the recently published discussion paper from Harvard University’s Joan Shoreenstein Center on the Press, Politics and Public Policy by former U.S. Chief Information Officer, Vivek Kundra, entitled Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect [pdf].  Well worth a read to place the current [Digital] Revolution we are somewhere in the middle of, in relation to preceding revolutions and the ages that they begat – the agricultural age, lasting thousands of years – the industrial age, lasting hundreds of years – and the digital revolution/age, which has already had massive impacts on individuals, society, governments and commerce in just a few short decades.

d70_kundra.pdf Paraphrasing his introduction: the microprocessor, “the new steam engine” powering the Information Economy is being fuelled by open data.  Stepping on to dangerous mixed metaphor territory here but,  I see him implying that the network effects, both technological and social, are turning that basic open data fuel in to high-octane brew driving massive change to our world.

Vivek goes on to catalogue some of the effects of this digital revolution.  With his background in county, sate, and federal US government it is unsurprising that his examples are around the effects of opening up public data, but that does not make them less valid.  He talks about four shifts in power that are emerging and/or need to occur:

  • Fighting government corruption, improving accountability and enhancing government services – open [democratised] data driving the public’s ability to hold the public sector to account, exposing hidden, or unknown, facts and trends.
  • Changing the default setting of government to open, transparent and participatory – changing the attitude of those within government to openly publish their data by default so that it can be used to inform their populations, challenge their actions and services, and stimulate innovation.
  • Create new models of journalism to separate signal from noise to provide meaningful insights – innovative analysis of publicly available data can surface issues and stories that would otherwise be buried in the noise of general government output.
  • Launch multi-billion dollar businesses based upon public sector data – by applying their specific expertise to the analysis, collation, and interpretation of open public data

All good stuff, and a great overview for those looking at this digital revolution as impacted by public open data.  As to what sort of age it will lead to, I think we need to look at a couple of steps further on in the revolution.

The agricultural revolution was based upon the move away from a nomadic existence, the planting and harvesting of crops and the creation of settlements.  The age that follows, I would argue, was based upon the outputs of those efforts enabling the creation of business and the trading of surpluses.  A new layer of commerce emerged, built upon the basic outputs of the revolutionary activities.

The industrial revolution introduced powered machines, replacing manual labour, massively increasing efficiency and productivity.  The age that followed was characterised by manufacturing – a new layer of added value, taking the basic raw materials produced or mined buy these machines and combining them in to new complex products.

Which brings me to what I would prefer to call the data revolution, where today we are seeing data as a fuel consumed to drive our information steam engines.  I would argue that soon we will recognise that data is not just a fuel but also a raw material.  Data from from many sources (public, private and personal) in many forms (open, commercially licensed and closed), will be combined with entrepreneurial innovation and refined to produce new complex products and services. In the same way that whole new industries emerged in the industrial era, I believe we will look back at today and see the foundations of new and future industries.  I published some thoughts on this in a previous post a year or so ago which I believe are still relevant.

Today, unless you want to expound significant effort and understanding of individual data, it is difficult to deliver an information service or application that depends on more than a couple of data sources.  This is because we are still trying to establish the de facto standards for presenting, communicating and consuming data.  We have mostly succeeded for web pages, with html and the gradual demise of pragmatic moment-in-time diversionary solutions such as flash.  However on the data front, we are still where the automobile industry was before agreeing what order and where to place the foot peddles in a car.

The answer I believe will emerge to be the adoption of data packaging, and linking techniques and standards – Linked Data.  I say this, not just because I am evangelist for the benefits of Linked Data, but because it exhibits the same distributed open and generic features that exemplify what has been successful for the Web.  It also builds upon those Web standards.  Much is talked, and hyped, about Big Data – another moment-in-time term.  Once we start linking, consuming, and building, it will be on a foundation of data that could only be described as big.  What we label Big today, will soon appear to be normal.

What of the Semantic web I am asked.  I believe the Semantic Web is a slightly out of focus vision of how the Information Age may look when it is established, expressed in the terms only of what we understand today.  So this is what I am predicting will arrive, but I am also predicting that we will eventually call it something else.

Picture of Vivek Kundra from Wikipedia.
Comment   or   Contact us

National Archives announced today UK government licensing policy extended to make more public sector information available:

Building on the success of the Open Government Licence, The National Archives has extended the scope of its licensing policy, encouraging and enabling even easier re-use of a wider range of public sector information.

The UK Government Licensing Framework (UKGLF), the policy and legal framework for the re-use of public sector information, now offers a growing portfolio of licences and guidance to meet the diverse needs and requirements of both public sector information providers and re-user communities.

On the surface this is move is to to be welcomed.  Providing, amongst other things, licensing choices and guidance for re-using information free of charge for non-commercial purposes – the Non-Commercial Government Licence; guidance to licensing where charges apply and for the licensing of software and source code.

All this is available from the UK Government Licensing Framework area of the National Archives site, along with FAQs and other useful supporting information, including machine readable licenses.

As the press release says, the extensions are building on the success of the Open Government License(OGL) and are designed to cover what the OGL can not.

So the [data publishers] thought process should be to try to publish under the OGL and then, only if ownership/licensing/cost of production provide an overwhelming case to be more restrictive, utilise these extensions and/or guidance.

My concern, having listened to many questions at conferences from what I would characterise as government conservative traditionalists, is that many will start at the charge-for/non-commercial use end of this licensing spectrum because of the fear/danger of opening up data too openly.  I do hope my concerns are unfounded and that the use of these extensions will be the exception, with the OGL being the de facto licence of choice for all public sector data.

This post was also published on the Talis Consulting Blog
Comment   or   Contact us

Friday night – nothing on the TV – I know! I’ll browse through the Protection of Freedoms Bill, currently passing through the UK Parliament. Sad I know, but interesting.

Government spending data published %007C Number10.gov.uk Lets scroll back in time a bit to November 19th 2010 and a government press conference introduced by a video from Prime Minister David Cameron.  The headline story was about the publishing of government spending and contract data, but towards the end of this 109 second short he said the following:

… the most exciting is a new right to data. Which will let people request streams of government information and use it for social or commercial purposes.  Take all this together and we really can make this one of the most open, accountable and transparent governments there is.  Let me end by saying this. You are going to have so much information about what we do, how much of your money we spend doing it, and what the outcome is.  So use it, exploit it, hold us to account.  Together we can set a great example of what a modern democracy aught to look like. (my emphasis)

Obviously to realise this Right to Data there needs to be some legislation, which brings me to the Protection of Freedoms Bill.  This is one of those bills which covers all sorts of issues, from rules for destruction of fingerprints and DNA profiles, CCTV camera regulations, detention of terrorist suspects, to freedom of information and data protection.  Zooming in on the bits on the topic of the release and publication of datasets held by public authorities, we find a set of clauses that amend the Freedom of Information Act 2000.

Re-use

After some amendments which allow for datasets and provision in electronic form we get this: “the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.”  Unfortunately there is no definition of the term re-use.  It could be argued that a pdf of some tables in a MS Word document could be re-used, where as I believe the spirit of the legislation should be made more explicit to by identifying non-proprietary data formats.  I know this would be a tricky job for the parliamentary draftsmen, as we would not want to restrict it to things, such as XML and csv, that could age and be replaced by something better which then could not be used as it had not been mentioned in the legislation, but I believe that just using the term ‘re-use’ is far too woolly and open to [mis]interpretation.

What is [not] a dataset

This is one of the areas that raises most concern for me. Checkout this wording from the Bill:text1 I am OK with (a) – data collected as part of an authority doing it’s job – and (c) – don’t change the data you have collected – publishing that raw data is important.  However (b) specifically excludes data that is the product of analysis.  Presumably analysis of collected data is one significant way that an authority measures the outcomes of its efforts.  Understanding that analysis will help understand the subsequent decisions and actions they make and take.  I assume that there may be some specific reasons that underpin this blanket exclusion of analysis data.  If there are, they should be identified, instead of generally throttling the output of useful data that will go a long way to helping with Mr Cameron’s stated ambition for us to be able to see “what the outcome is” of the spending of public money.

Release of datasets for re-use

This is a whole new section (11A)  to be added to the 2000 act to cover the release of datasets. It covers ownership, copyright, and/or database right of the information to be published and states that it should be published under “the licence specified by the Secretary of State in a code of practice issued under section 45”. Section 45 basically puts in to the hands of the Secretary of State the definition of the license(s) data should be published under.  As of today the Open Government Licence for public sector information is what is wanted to keep the publishing of information open.  However, what is there to stop a future Secretary of State, who has a less open outlook in replacing it with far more restrictive licences?  Do we not need some form of presumption of openness being attached to the Secretary of States powers as part of this change in legislation?

On the topic of presumptions of openness, the wording of this bill contains phrases such as “unless the authority is satisfied that it is not appropriate for the dataset to be published” and “where reasonably practicable”.  It is clear that many in the public sector are not as enthusiastic about publishing data as the current government position and such vague phrases as these may well be unreasonably used by some in justifying a throttling of the stream of information.   They could easily be used to build in a bureaucratic decision hurdle for each dataset to have to jump, proving its appropriateness and practicality, before publication.  I am sure that it would not be beyond a parliamentary draftsman’s skill to produce wording that means that all will be published, unless a specific objection is raised for an individual dataset, for reasons of excessive effort or data protection reasons.

Up-dated data

Data published by an authority should be published under a scheme, the following applies here:Protection of Freedoms Bill (HC Bill 146)How should we interpret “any up-dated version held by the authority of such a dataset”? My interpretation is that once a dataset has been published is shall continue to be published as it changes.  The precedent for this is spending data – having published authority spending for January 2011, authorities should be automatically publishing it for February and following months.  But what if, in response to a request, an authority publishes the contents of a spreadsheet used to track the amount of salt applied to roads in its area during winter 2010-11 and then uses a different spreadsheet for the following winter.  Does the output of that new spreadsheet constitute a new dataset, or an up-date to it’s predecessor?  From the wording in the Bill it is not clear.

Who does it cover?

I probably need a bit of help here from those that understand the public sector better than I do, but I am suspicious that references to the organisations listed in Schedule 1 and “the wider public sector”, do not take the net wide enough to cover some of the data that is relevant to our daily lives but is delivered on behalf of some authorities by third parties.  For example I am aware that recently a large city was not able to inform citizens of their rubbish collection schedules because that data was considered as commercially restricted by their service provider.

 

So in summary, I welcome the commitment to a right to data being realised by streams of government information about what we do, how much of our money is spend doing it, and what the outcomes are.  However, I am sceptical as to how effective the measures in the current Protection of Freedoms Bill will be in delivering them.  Especially in the light of very recent comments made by the Prime Minister highlighting the “enemies of enterprise” in Whitehall and town halls across the country, attacking what he called the “mad” bureaucracy that holds back entrepreneurs.  Those enemies are just the people who might take the wording of this bill as ammunition in their cause.

mug Whilst being concerned about this topic, I have been wondering why few are commenting on it.  Are the majority just taking the press conference statements by David Cameron, and his fellow Ministers, as indications of a battle won, or am I missing something?  I promote Sir Tim Berners-Lee’s 5 Star Data as the steps towards a Web of Linked Data – if we don’t get the publishing of public sector data to at least 3 star standard (Available as machine-readable structured data – in non-proprietary format), many of the current ambitions may remain just that, ambitions.  That would be a massive missed opportunity.

So are we getting a right to data? – or just some provisions to extend the Freedom of Information Act a bit further in the dataset direction?  I’m not sure.

Personal note: As you may tell from the above, I am no expert on the interpretation of parliamentary legislation, and I have left several unanswered questions hanging in this post.  Any help in clarifying my thinking, confirming or disproving my assumptions, or answering some of those questions, will be gratefully received in comments to this post or your own posted thoughts.

This post was also published on the Nodalities Blog
Comment   or   Contact us