
Semantic Tech & Business Conference
San Francisco 2-5 June, 2013
Register
Some in the surfing community will tell you that every seventh wave is a big one. I am getting the feeling, in the world of Web, that a number seven is up next and this one is all about data. The last seventh wave was the Web itself. Because of that, it is a little constraining to talk about this next one only effecting the world of the Web. This one has the potential to shift some significant rocks around on all our beaches and change the way we all interact and think about the world around us.
Sticking with the seashore metaphor for a short while longer; waves from the technology ocean have the potential to wash into the bays and coves of interest on the coast of human endeavour and rearrange the pebbles on our beaches. Some do not reach every cove, and/or only have minor impact, however some really big waves reach in everywhere to churn up the sand and rocks, significantly changing the way we do things and ultimately think about the word around us. The post Web technology waves have brought smaller yet important influences such as ecommerce, social networking, and streaming.
I believe Data, or more precisely changes in how we create, consume, and interact with data, has the potential to deliver a seventh wave impact. Enough of the grandiose metaphors and down to business.
Data has been around for centuries, from clay tablets to little cataloguing tags on the end of scrolls in ancient libraries, and on into computerised databases that we have been accumulating since the 1960′s. Up until very recently these [digital] data have been closed – constrained by the systems that used them, only exposed to the wider world via user interfaces and possibly a task/product specific API. With the advent of many data associated advances, variously labelled Big Data, Social Networking, Open Data, Cloud Services, Linked Data, Microformats, Microdata, Semantic Web, Enterprise Data, it is now venturing beyond those closed systems into the wider world.
Well this is nothing new, you might say, these trends have been around for a while – why does this constitute the seventh wave of which you foretell?
It is precisely because these trends have been around for a while, and are starting to mature and influence each other, that they are building to form something really significant. Take Open Data for instance where governments have been at the forefront – I have reported before about the almost daily announcements of open government data initiatives. The announcement from the Dutch City of Enschede this week not only talks about their data but also about the open sourcing of the platform they use to manage and publish it, so that others can share in the way they do it.
In the world of libraries, the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid are providing a contribution of linked bibliographic data to the gathering mass, alongside the British and Germans, with 2.4 Million bibliographic records from the Spanish National Library. This adds weight to the arguments for a Linked Data future for libraries proposed by the Library of Congress and Stanford University.
I might find some of the activities in the Cloud Computing short-sighted and depressing, yet already the concept of housing your data somewhere other than in a local datacenter is becoming accepted in most industries.
Enterprise use of Linked Data by leading organisations such as the BBC who are underpinning their online Olympics coverage with it are showing that it is more that a research tool, or the province only of the open data enthusiasts.
Data Marketplaces are emerging to provide platforms to share and possibly monetise your data. An example that takes this one step further is Kasabi.com from the leading Semantic Web technology company, Talis. Kasabi introduces the data mixing, merging, and standardised querying of Linked Data into to the data publishing concept. This potentially provides a platform for refining and mixing raw data in to new data alloys and products more valuable and useful than their component parts. An approach that should stimulate innovation both in the enterprise and in the data enthusiast community.
The Big Data community is demonstrating that there are solutions, to handling the vast volumes of data we are producing, that require us to move out of the silos of relational databases towards a mixed economy. Programs need to move – not the data, NoSQL databases, Hadoop, map/reduce, these are are all things that are starting to move out of the labs and the hacker communities into the mainstream.
The Social Networking industry which produces tons of data is a rich field for things like sentiment analysis, trend spotting, targeted advertising, and even short term predictions – innovation in this field has been rapid but I would suggest a little hampered by delivering closed individual solutions that as yet do not interact with the wider world which could place them in context.
I wrote about Schema.org a while back. An initiative from the search engine big three to encourage the SEO industry to embed simple structured data in their html. The carrot they are offering for this effort is enhanced display in results listings – Google calls these Rich Snippets. When first announce, the schema.org folks concentrated on Microdata as the embedding format – something that wouldn’t frighten the SEO community horses too much. However they did [over a background of loud complaining from the Semantic Web / Linked Data enthusiasts that RDFa was the only way] also indicate that RDFa would be eventually supported. By engaging with SEO folks on terms that they understand, this move from from Schema.org had the potential to get far more structured data published on the Web than any TED Talk from Sir Tim Berners-Lee, preaching from people like me, or guidelines from governments could ever do.
The above short list of pebble stirring waves is both impressive in it’s breadth and encouraging in it’s potential, yet none of them are the stuff of a seventh wave.
So what caused me to open up my Macbook and start writing this. It was a post from Manu Sporny, indicating that Google were not waiting for RDFa 1.1 Lite (the RDF version that schema.org will support) to be ratified. They are already harvesting, and using, structured information from web pages that has been encoded using RDF. The use of this structured data has resulted in enhanced display on the Google pages with items such as event date & location information,and recipe preparation timings.
Manu references sites that seem to be running Drupal, the open source CMS software, and specifically a Drupal plug-in for rendering Schema.org data encoded as RDFa. This approach answers some of the critics of embedding Schema.org data into a site’s html, especially as RDF, who say it is ugly and difficult to understand. It is not there for humans to parse or understand and, with modules such as the Drupal one, humans will not need to get there hands dirty down at code level. Currently Schema.org supports a small but important number of ‘things’ in it’s recognised vocabularies. These, currently supplemented by GoodRelations and Recipes, will hopefully be joined by others to broaden the scope of descriptive opportunities.
So roll the clock forward, not too far, to a landscape where a large number of sites (incentivised by the prospect of listings as enriched as their competitors results) are embedding structured data in their pages as normal practice. By then most if not all web site delivery tools should be able to embed the Schema.org RDF data automatically. Google and the other web crawling organisations will rapidly build up a global graph of the things on the web, their types, relationships and the pages that describe them. A nifty example of providing a very specific easily understood benefit in return for a change in the way web sites are delivered, that results in a global shift in the amount of structured data accessible for the benefit of all. Google Fellow and SVP Amit Singhal recently gave insight into this Knowledge Graph idea.
The Semantic Web / Linked Data proponents have been trying to convince everyone else of the great good that will follow once we have a web interlinked at the data level with meaning attached to those links. So far this evangelism has had little success. However, this shift may give them what they want via an unexpected route.
Once such a web emerges, and most importantly is understood by the commercial world, innovations that will influence the way we interact will naturally follow. A Google TV, with access to such rich resource, should have no problem delivering an enhanced viewing experience by following structured links embedded in a programme page to information about the cast, the book of the film, the statistics that underpin the topic, or other programmes from the same production company. Our iPhone version next-but-one, could be a personal node in a global data network, providing access to relevant information about our location, activities, social network, and tasks.
These slightly futuristic predictions will only become possible on top of a structured network of data, which I believe is what could very well immerge if you follow through on the signs that Manu is pointing out. Reinforced by, and combining with, the other developments I reference earlier in this post, I believe we may well have a seventh wave approaching. Perhaps I should look at the beach again in five years time to see if I was right.
Wave photo from Nathan Gibbs in Flickr Declarations – I am a Kasabi Partner and shareholder in Kasabi parent company Talis.
Hi Richard, great post!
Two clarifications, one small and one large:
First, the small clarification. You said: “… Google were not waiting for RDFa 1.1 Lite (the RDF version that schema.org will support) to be ratified. They are already harvesting, and using, structured information from web pages that has been encoded using RDF.”
The text is a bit mis-leading, or could be confusing for someone that is not familiar with the nuance between RDF (the data model), RDFa (the syntax), and when Google started indexing RDFa. Really, what happened is that Google isn’t waiting for RDFa 1.1 to be ratified /for the purposes of schema.org/. They’re just going ahead and accepting RDFa 1.0 today for schema.org. They’ve been accepting RDFa 1.0 in their Rich Snippets stuff since 2009: http://radar.oreilly.com/2009/05/google-announces-support-for-m.html .
That is, Google has been indexing RDFa 1.0 since 2009. Google launched schema.org with no RDFa support. Google now supports RDFa 1.0 in schema.org. Google has announced that it intends to support RDFa 1.1 Lite in schema.org. This means that every version of RDFa, both in XHTML1 and HTML5, will be supported by schema.org in time.
The second large clarification has to do with this statement:
“The Semantic Web / Linked Data proponents have been trying to convince everyone else of the great good that will follow once we have a web interlinked at the data level with meaning attached to those links. So far this evangelism has had little success. However, this shift may give them what they want via an unexpected route.”
I wouldn’t lump everyone that is working on the Semantic Web and Linked Data into the same bunch. There are wildly differing opinions on technology and strategy there. There are people, such as myself, that feel that RDF has traditionally been presented in a way that is inaccessible to most Web developers. This frustration is part of where RDFa and JSON-LD ( http://json-ld.org/ ) came from. So, while some of the goals of the Semantic Web and Linked Data communities may overlap, the dissemination strategy, tools, and teaching philosophies vary wildly at times. At no point can anyone point to a single group and say “those are the Semantic Web folks” or “those are the Linked Data folks” just as we can’t point to a single group and say “those are the REST folks” or “those are Web Developer folks”. Reality tends to be far more diverse and nuanced than our human minds find comfortable categorizing.
As for the “unexpected route” that RDFa has taken – that’s simply not true. We have been very deliberate in making sure that we listen, respond, and build technologies that work for companies like Facebook, Google, Yahoo, Best Buy, Microsoft, Yandex and many others. RDFa Lite 1.1 was a direct result of talks with the search companies… we are being extremely deliberate and sensitive to the needs of those that are showing an interest in adopting and deploying RDFa.
Overall, I liked the post – keep up the good work.
Hi Manu,
Thanks for comments and clarifications.
I did try to go into a bit more detail around the RDF/RDFa/1.0/1.1/Lite issue, but ended up editing it back as the detail was obscuring my point somewhat – and the post was getting a bit overlong. Your clarification will be a great help those that want a better understanding of what is at play here.
On your other point – yes I totally agree there is a broad spectrum of overlapping opinion about the theoretical and practical benefits and opportunities when applying Linked Data techniques and technologies and/or striving towards the Semantic Web vision.
Unfortunately many, who are only distant spectators of the development of structured data on the web, do lump the communities together as one.
However in retrospect, maybe I should have started that paragraph as “Some Semantic Web / Linked Data proponents..”
Thanks again.
~Richard.
[...] amount of library catalogue pages that will mach a search for a book title. As referred to previously, Google are assembling a graph of related things. In this context the thing is the concept of the [...]
[...] stick at – all and none of which could be considered to be the killer.Back to my domain , data. As I have postulated previously, I believe we are nearing a point where data, it’s use, our access to it, and the attention and [...]
[...] Yet it is only a symptom of something much bigger and game-changing as I postulated last month A Data 7th Wave is Approaching.Comment below or Contact us /* */ var disqus_url = [...]
[...] back to my original question – is this debate fundamental? With the approaching wave of data on the web, yes I believe it is – or at least ending it in a satisfactory way is. Those [...]
[...] to Phil’s question – I think this in retrospect may seem a King Canute style proclamation. If my predictions are correct, it won’t be too long before we are up to our ears in structured data on the web, [...]
[...] Wallis believes that big things are just around the corner in the world of data. He writes, “Some in the surfing community will tell you that every seventh wave is a big [...]
[...] Richard Wallis (late of Talis, now OCLC) recently summarized these trends in terms of web-wide factors in his post A data 7th wave approaching: [...]