Open Data: Digital Fuel or Raw Material?

I have been reading with interest the recently published discussion paper from Harvard University’s Joan Shoreenstein Center on the Press, Politics and Public Policy by former U.S. Chief Information Officer, Vivek Kundra, entitled Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect [pdf]. Well worth a read to place the current [Digital] Revolution we are somewhere in the middle of, in relation to preceding revolutions and the ages that they begat – the agricultural age, lasting thousands of years – the industrial age, lasting hundreds of years – and the digital revolution/age, which has already had massive impacts on individuals, society, governments and commerce in just a few short decades.

Paraphrasing his introduction: the microprocessor, “the new steam engine” powering the Information Economy is being fuelled by open data. Stepping on to dangerous mixed metaphor territory here but, I see him implying that the network effects, both technological and social, are turning that basic open data fuel in to high-octane brew driving massive change to our world.

Vivek goes on to catalogue some of the effects of this digital revolution. With his background in county, sate, and federal US government it is unsurprising that his examples are around the effects of opening up public data, but that does not make them less valid. He talks about four shifts in power that are emerging and/or need to occur:

Fighting government corruption, improving accountability and enhancing government services – open [democratised] data driving the public’s ability to hold the public sector to account, exposing hidden, or unknown, facts and trends.
Changing the default setting of government to open, transparent and participatory – changing the attitude of those within government to openly publish their data by default so that it can be used to inform their populations, challenge their actions and services, and stimulate innovation.
Create new models of journalism to separate signal from noise to provide meaningful insights – innovative analysis of publicly available data can surface issues and stories that would otherwise be buried in the noise of general government output.
Launch multi-billion dollar businesses based upon public sector data – by applying their specific expertise to the analysis, collation, and interpretation of open public data

All good stuff, and a great overview for those looking at this digital revolution as impacted by public open data. As to what sort of age it will lead to, I think we need to look at a couple of steps further on in the revolution.

The agricultural revolution was based upon the move away from a nomadic existence, the planting and harvesting of crops and the creation of settlements. The age that follows, I would argue, was based upon the outputs of those efforts enabling the creation of business and the trading of surpluses. A new layer of commerce emerged, built upon the basic outputs of the revolutionary activities.

The industrial revolution introduced powered machines, replacing manual labour, massively increasing efficiency and productivity. The age that followed was characterised by manufacturing – a new layer of added value, taking the basic raw materials produced or mined buy these machines and combining them in to new complex products.

Which brings me to what I would prefer to call the data revolution, where today we are seeing data as a fuel consumed to drive our information steam engines. I would argue that soon we will recognise that data is not just a fuel but also a raw material. Data from from many sources (public, private and personal) in many forms (open, commercially licensed and closed), will be combined with entrepreneurial innovation and refined to produce new complex products and services. In the same way that whole new industries emerged in the industrial era, I believe we will look back at today and see the foundations of new and future industries. I published some thoughts on this in a previous post a year or so ago which I believe are still relevant.

Today, unless you want to expound significant effort and understanding of individual data, it is difficult to deliver an information service or application that depends on more than a couple of data sources. This is because we are still trying to establish the de facto standards for presenting, communicating and consuming data. We have mostly succeeded for web pages, with html and the gradual demise of pragmatic moment-in-time diversionary solutions such as flash. However on the data front, we are still where the automobile industry was before agreeing what order and where to place the foot peddles in a car.

The answer I believe will emerge to be the adoption of data packaging, and linking techniques and standards – Linked Data. I say this, not just because I am evangelist for the benefits of Linked Data, but because it exhibits the same distributed open and generic features that exemplify what has been successful for the Web. It also builds upon those Web standards. Much is talked, and hyped, about Big Data – another moment-in-time term. Once we start linking, consuming, and building, it will be on a foundation of data that could only be described as big. What we label Big today, will soon appear to be normal.

What of the Semantic web I am asked. I believe the Semantic Web is a slightly out of focus vision of how the Information Age may look when it is established, expressed in the terms only of what we understand today. So this is what I am predicting will arrive, but I am also predicting that we will eventually call it something else.

Picture of Vivek Kundra from Wikipedia.

Structured Data / Schema.org Site Audit Service Launched

Open Data: Digital Fuel or Raw Material?

Leave a Reply Cancel reply