Some in the surfing community will tell you that every seventh wave is a big one. I am getting the feeling, in the world of Web, that a number seven is up next and this one is all about data. The last seventh wave was the Web itself. Because of that, it is a little constraining to talk about this next one only effecting the world of the Web. This one has the potential to shift some significant rocks around on all our beaches and change the way we all interact and think about the world around us.
Sticking with the seashore metaphor for a short while longer; waves from the technology ocean have the potential to wash into the bays and coves of interest on the coast of human endeavour and rearrange the pebbles on our beaches. Some do not reach every cove, and/or only have minor impact, however some really big waves reach in everywhere to churn up the sand and rocks, significantly changing the way we do things and ultimately think about the word around us. The post Web technology waves have brought smaller yet important influences such as ecommerce, social networking, and streaming.
I believe Data, or more precisely changes in how we create, consume, and interact with data, has the potential to deliver a seventh wave impact. Enough of the grandiose metaphors and down to business.
Data has been around for centuries, from clay tablets to little cataloguing tags on the end of scrolls in ancient libraries, and on into computerised databases that we have been accumulating since the 1960’s. Up until very recently these [digital] data have been closed – constrained by the systems that used them, only exposed to the wider world via user interfaces and possibly a task/product specific API. With the advent of many data associated advances, variously labelled Big Data, Social Networking, Open Data, Cloud Services, Linked Data, Microformats, Microdata, Semantic Web, Enterprise Data, it is now venturing beyond those closed systems into the wider world.
Well this is nothing new, you might say, these trends have been around for a while – why does this constitute the seventh wave of which you foretell?
It is precisely because these trends have been around for a while, and are starting to mature and influence each other, that they are building to form something really significant. Take Open Data for instance where governments have been at the forefront – I have reported before about the almost daily announcements of open government data initiatives. The announcement from the Dutch City of Enschede this week not only talks about their data but also about the open sourcing of the platform they use to manage and publish it, so that others can share in the way they do it.
I might find some of the activities in the Cloud Computing short-sighted and depressing, yet already the concept of housing your data somewhere other than in a local datacenter is becoming accepted in most industries.
Enterprise use of Linked Data by leading organisations such as the BBC who are underpinning their online Olympics coverage with it are showing that it is more that a research tool, or the province only of the open data enthusiasts.
Data Marketplaces are emerging to provide platforms to share and possibly monetise your data. An example that takes this one step further is Kasabi.com from the leading Semantic Web technology company, Talis. Kasabi introduces the data mixing, merging, and standardised querying of Linked Data into to the data publishing concept. This potentially provides a platform for refining and mixing raw data in to new data alloys and products more valuable and useful than their component parts. An approach that should stimulate innovation both in the enterprise and in the data enthusiast community.
The Big Data community is demonstrating that there are solutions, to handling the vast volumes of data we are producing, that require us to move out of the silos of relational databases towards a mixed economy. Programs need to move – not the data, NoSQL databases, Hadoop, map/reduce, these are are all things that are starting to move out of the labs and the hacker communities into the mainstream.
The Social Networking industry which produces tons of data is a rich field for things like sentiment analysis, trend spotting, targeted advertising, and even short term predictions – innovation in this field has been rapid but I would suggest a little hampered by delivering closed individual solutions that as yet do not interact with the wider world which could place them in context.
I wrote about Schema.org a while back. An initiative from the search engine big three to encourage the SEO industry to embed simple structured data in their html. The carrot they are offering for this effort is enhanced display in results listings – Google calls these Rich Snippets. When first announce, the schema.org folks concentrated on Microdata as the embedding format – something that wouldn’t frighten the SEO community horses too much. However they did [over a background of loud complaining from the Semantic Web / Linked Data enthusiasts that RDFa was the only way] also indicate that RDFa would be eventually supported. By engaging with SEO folks on terms that they understand, this move from from Schema.org had the potential to get far more structured data published on the Web than any TED Talk from Sir Tim Berners-Lee, preaching from people like me, or guidelines from governments could ever do.
The above short list of pebble stirring waves is both impressive in it’s breadth and encouraging in it’s potential, yet none of them are the stuff of a seventh wave.
So what caused me to open up my Macbook and start writing this. It was a post from Manu Sporny, indicating that Google were not waiting for RDFa 1.1 Lite (the RDF version that schema.org will support) to be ratified. They are already harvesting, and using, structured information from web pages that has been encoded using RDF. The use of this structured data has resulted in enhanced display on the Google pages with items such as event date & location information,and recipe preparation timings.
Manu references sites that seem to be running Drupal, the open source CMS software, and specifically a Drupal plug-in for rendering Schema.org data encoded as RDFa. This approach answers some of the critics of embedding Schema.org data into a site’s html, especially as RDF, who say it is ugly and difficult to understand. It is not there for humans to parse or understand and, with modules such as the Drupal one, humans will not need to get there hands dirty down at code level. Currently Schema.org supports a small but important number of ‘things’ in it’s recognised vocabularies. These, currently supplemented by GoodRelations and Recipes, will hopefully be joined by others to broaden the scope of descriptive opportunities.
So roll the clock forward, not too far, to a landscape where a large number of sites (incentivised by the prospect of listings as enriched as their competitors results) are embedding structured data in their pages as normal practice. By then most if not all web site delivery tools should be able to embed the Schema.org RDF data automatically. Google and the other web crawling organisations will rapidly build up a global graph of the things on the web, their types, relationships and the pages that describe them. A nifty example of providing a very specific easily understood benefit in return for a change in the way web sites are delivered, that results in a global shift in the amount of structured data accessible for the benefit of all. Google Fellow and SVP Amit Singhal recently gave insight into this Knowledge Graph idea.
The Semantic Web / Linked Data proponents have been trying to convince everyone else of the great good that will follow once we have a web interlinked at the data level with meaning attached to those links. So far this evangelism has had little success. However, this shift may give them what they want via an unexpected route.
Once such a web emerges, and most importantly is understood by the commercial world, innovations that will influence the way we interact will naturally follow. A Google TV, with access to such rich resource, should have no problem delivering an enhanced viewing experience by following structured links embedded in a programme page to information about the cast, the book of the film, the statistics that underpin the topic, or other programmes from the same production company. Our iPhone version next-but-one, could be a personal node in a global data network, providing access to relevant information about our location, activities, social network, and tasks.
These slightly futuristic predictions will only become possible on top of a structured network of data, which I believe is what could very well immerge if you follow through on the signs that Manu is pointing out. Reinforced by, and combining with, the other developments I reference earlier in this post, I believe we may well have a seventh wave approaching. Perhaps I should look at the beach again in five years time to see if I was right.
Wave photo from Nathan Gibbs in Flickr
Declarations – I am a Kasabi Partner and shareholder in Kasabi parent company Talis.
The BBC have been at the forefront of the real application of Linked Data techniques and technologies for some time. It has been great to see them evolve from early experiments by BBC Backstage working with Talis to publish music and programmes data as RDF – to see what would happen.
Their Wildlife Finder that drives the stunning BBC Nature site has been at the centre of many of my presentations promoting Linked Data over the last couple of years. It not only looks great, but it also demonstrates wonderfully the follow-your-nose navigation around a site that naturally occurs if you let the underlying data model show you the way.
The BBC team have been evolving their approach to delivering agile, effective, websites in an efficient way by building on Linked Data foundations sector by sector – wildlife, news, music, World Cup 2010, and now in readiness for London 2012 – the whole sport experience. Since the launch a few days ago, the main comment seems to be that it is ‘very yellow’, which it is. Not much reference to the innovative approach under the hood – as it should be. If you can see the technology, you have got it wrong.
In an interesting post on the launch Ben Gallop shares some history about the site and background on the new version. With a site which gets around 15 million unique visitors a week they have a huge online audience to serve. Cait O’Riodan in a more technical post talks about the efficiency gains of taking the semantic web technologies approach:
Doing more with less One of the reasons why we are able to cover such a wide range of sports is that we have invested in technology which allows our journalists to spend more time creating great content and less time managing that content.
In the past when a journalist wrote a story they would have to place that story on every relevant section of the website.
A story about Arsenal playing Manchester United, for example, would have to be placed manually on the home page, the Football page, the premier league page, the Arsenal page and the Manchester United page – a very time consuming and labour intensive process.
Now the journalists tell the system what the story is about and that story is automatically placed on all the relevant parts of the site.
We are using semantic web technologies to do this, an exciting evolution of a project begun with the Vancouver Winter Games and extended with the BBC’s 2010 World Cup website. It will really come into its own during the Olympics this summer.
It is that automatic placement, and linking, of stories that leads to the natural follow-your-nose navigation around the site. If previous incarnations of the BBC using this approach are anything to go by, there will also be SEO benefits as well – as I have discussed previously.
The data model used under the hood of the Sports site is based upon the Sport Ontology openly published by them. Check out the vocabulary diagram to see how they have mapped out and modelled the elements of a sporting event, competition, and associated broadcast elements. A great piece of work from the BBC teams.
In addition to the visual, navigation and efficiency benefits this launch highlights, it also settles the concerns that Linked Data / Semantic Web technologies can not perform. This site is supporting 15 million unique visitors a week and will probably be supporting a heck of a lot more during the Olympics. That is real web scale!
Yesterday found me in the National Hall of London’s Olympia, checking out Cloud Expo Europe. Much, in Linked Data circles, is implied about the mutual benefit of adopting the Cloud and Linked Open Data. Many of the technology and service providers in the Linked Data / Semantic Web space benefit from the scalability, flexibility, of delivering their services from the Cloud as Software and/or Data as a Service (SaaS / DaaS). Many employ raw services from the cloud themselves, helping them accrue and pass on those benefits.
A prime example of this is Kasabi. A Linked Data powered data marketplace, built on the latest generation of the Talis SaaS data platform that already provides services for organisations such as Ordnance Survey and the British Library.
I know from experience, the Kasabi operation realises many of the benefits put forward as reasons to reach for the cloud by proponents of the technology – near to zero in-house infrastructure costs, ability to rapidly scale up or down in reaction to demands, availability everywhere on anything, lower costs, etc.
So I was interested to see if the cloud vendors were as data [and the value within it] aware as the Linked Data vendors are cloud aware.
Unfortunately I can only report back a massive sense of disappointment in the lack of vision, and siloed parochial views, from practically everyone I met. The visit to the show took me back a decade or so, to the equivalent events that then extolled the virtues of the latest stack of servers that would grace your datacenter. Same people, same type of sales pitch, but now slightly fewer scantily clad or silly costumed sales people, and an iPad to win on most every stand.
How can a cloud solution salesperson be siloed and parochial in their views you may ask. Isn’t the cloud all about opening up access to your business processes across locations and devices, taking you data out of your datacenters into hosted services, and saving money whilst gaining flexibility? Yes it is, and if I was running any organisation from a tiny one like Data Liberate to a massive corporation or government I would expect to be shot for not looking to the cloud for any new or refreshed service.
But, I would also expect to be severely criticised for not also looking to see what other value could be delivered by capitalising on the distributed nature of the cloud and the Web that delivers it. The basic pitch from many I spoke to boiled down to “let us take what you do, unchanged, and put it in our cloud“. One line I must share with you was “back your data up to our cloud and we cane save you all that hassle of mucking about with tapes“.
Perhaps I am being a bit harsh. There is potentially significant ROI that can be gained from moving processes, as is, in to the cloud and I would recommend all organisations to consider it. I expected a significant number of exhibitors to be doing exactly what I describe. My disappointment comes from finding not a single one who could see beyond the simple replacement of locally hosted hardware (and staff) with cloud services.
Perhaps I am getting a bit too visionary in my old age.
There was glimmer of light during the day – I read Paul Miller’s write up, and scanned the Twitter stream for #cloudcamp, which took place in London the evening before. Maybe I should have just attended that, which unfortunately I couldn’t. Then I might be less downbeat about the ‘Cloud’ future just taking us back to the same old implementations of the past, just hosted elsewhere – an opportunity being missed.
If you know different, let me know and raise my mood a bit.
Disclosure: I am a Kasabi Partner and shareholder in Kasabi’s parent company, Talis.Clouds from picture by Martin Sojka on Flickr
I have been reading with interest the recently published discussion paper from Harvard University’s Joan Shoreenstein Center on the Press, Politics and Public Policy by former U.S. Chief Information Officer, Vivek Kundra, entitled Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect [pdf]. Well worth a read to place the current [Digital] Revolution we are somewhere in the middle of, in relation to preceding revolutions and the ages that they begat – the agricultural age, lasting thousands of years – the industrial age, lasting hundreds of years – and the digital revolution/age, which has already had massive impacts on individuals, society, governments and commerce in just a few short decades.
Paraphrasing his introduction: the microprocessor, “the new steam engine” powering the Information Economy is being fuelled by open data. Stepping on to dangerous mixed metaphor territory here but, I see him implying that the network effects, both technological and social, are turning that basic open data fuel in to high-octane brew driving massive change to our world.
Vivek goes on to catalogue some of the effects of this digital revolution. With his background in county, sate, and federal US government it is unsurprising that his examples are around the effects of opening up public data, but that does not make them less valid. He talks about four shifts in power that are emerging and/or need to occur:
Fighting government corruption, improving accountability and enhancing government services – open [democratised] data driving the public’s ability to hold the public sector to account, exposing hidden, or unknown, facts and trends.
Changing the default setting of government to open, transparent and participatory – changing the attitude of those within government to openly publish their data by default so that it can be used to inform their populations, challenge their actions and services, and stimulate innovation.
Create new models of journalism to separate signal from noise to provide meaningful insights – innovative analysis of publicly available data can surface issues and stories that would otherwise be buried in the noise of general government output.
Launch multi-billion dollar businesses based upon public sector data – by applying their specific expertise to the analysis, collation, and interpretation of open public data
All good stuff, and a great overview for those looking at this digital revolution as impacted by public open data. As to what sort of age it will lead to, I think we need to look at a couple of steps further on in the revolution.
The agricultural revolution was based upon the move away from a nomadic existence, the planting and harvesting of crops and the creation of settlements. The age that follows, I would argue, was based upon the outputs of those efforts enabling the creation of business and the trading of surpluses. A new layer of commerce emerged, built upon the basic outputs of the revolutionary activities.
The industrial revolution introduced powered machines, replacing manual labour, massively increasing efficiency and productivity. The age that followed was characterised by manufacturing – a new layer of added value, taking the basic raw materials produced or mined buy these machines and combining them in to new complex products.
Which brings me to what I would prefer to call the data revolution, where today we are seeing data as a fuel consumed to drive our information steam engines. I would argue that soon we will recognise that data is not just a fuel but also a raw material. Data from from many sources (public, private and personal) in many forms (open, commercially licensed and closed), will be combined with entrepreneurial innovation and refined to produce new complex products and services. In the same way that whole new industries emerged in the industrial era, I believe we will look back at today and see the foundations of new and future industries. I published some thoughts on this in a previous post a year or so ago which I believe are still relevant.
Today, unless you want to expound significant effort and understanding of individual data, it is difficult to deliver an information service or application that depends on more than a couple of data sources. This is because we are still trying to establish the de facto standards for presenting, communicating and consuming data. We have mostly succeeded for web pages, with html and the gradual demise of pragmatic moment-in-time diversionary solutions such as flash. However on the data front, we are still where the automobile industry was before agreeing what order and where to place the foot peddles in a car.
The answer I believe will emerge to be the adoption of data packaging, and linking techniques and standards – Linked Data. I say this, not just because I am evangelist for the benefits of Linked Data, but because it exhibits the same distributed open and generic features that exemplify what has been successful for the Web. It also builds upon those Web standards. Much is talked, and hyped, about Big Data – another moment-in-time term. Once we start linking, consuming, and building, it will be on a foundation of data that could only be described as big. What we label Big today, will soon appear to be normal.
What of the Semantic web I am asked. I believe the Semantic Web is a slightly out of focus vision of how the Information Age may look when it is established, expressed in the terms only of what we understand today. So this is what I am predicting will arrive, but I am also predicting that we will eventually call it something else.