Schema.org – Extending Benefits

I find myself in New York for the day on my way back from the excellent Smart Data 2015 Conference in San Jose. It’s a long story about red-eye flights and significant weekend savings which I won’t bore you with, but it did result in some great chill-out time in Central Park to reflect on the week.

In its long auspicious history the SemTech, Semantic Tech & Business, and now Smart Data Conference has always attracted a good cross section of the best and brightest in Semantic Web, Linked Data, Web, and associated worlds. This year was no different for me in my new role as an independent working with OCLC and at Google.

I was there on behalf of OCLC to review significant developments with Schema.org in general –  now with 640 Types (Classes) & 988 properties – used on over 10 Million web sites.  Plus the pioneering efforts OCLC are engaged with, publishing Schema.org data in volume from WorldCat.org and via APIs in their products.  Check out my slides:

By mining the 300+ million records in WorldCat to identify, describe, and publish approx. 200 million Work entity descriptions, and [soon to be shared] 90+ million Person entity descriptions, this pioneering continues.

These are not only significant steps forward for the bibliographic sector, but a great example of a pattern to be followed by most sectors:

  • Identify the entities in your data
  • Describe them well using Schema.org
  • Publish embedded in html
  • Work with, do not try to replace, the domain specific vocabularies – Bibframe in the library world
  • Work with the community to extend an enhance Schema.org to enable better representation of your resources
  • If Schema.org is still not broad enough for you, build an extension to it that solves your problems whilst still maintaining the significant benefits of sharing using Schema.org – in the library world’s case this was BiblioGraph.net

This has not been an overnight operation for OCLC. If you would like to read more about it, I can recommend the recently published Library Linked Data in the Cloud – Godby, Wang, Mixter.

Extending Schema.org
schemaorg1.jpgThrough OCLC and now Google I have been working with and around Schema.org since 2012. The presentation at Smart Data arrived at an opportune time to introduce and share some major developments with the vocabulary and the communities that surround it.

A couple of months back, Version 2.0 of Schema.org introduced the potential for extensions to the vocabulary. With Version 2.1, released the week before the conference, this potential became a reality with the introduction of bib.schema.org and auto.schema.org.

On a personal note the launch of these extensions, bib.schema.org in particular, is the culmination of a bit of a journey that started a couple of years ago with forming of the Schema Bib Extend W3C Community Group (SchemaBibEx) which had great success in proposing additions and changes to the core vocabulary.

A journey that then took in the formation of the BiblioGraph.net extension vocabulary which demonstrated both how to build a domain focused vocabulary on top of Schema.org as well as how the open source software, that powers the Schema.org site, could be forked for such an effort. These two laying the ground work for defining how hosted and external extensions will operate, and for SchemaBibex to be one of the first groups to propose a hosted extension.

Finally this last month working at Google with Dan Brickley on Schema.org, has been a bit of a blur as I brushed up my Python skills to turn the potential in version 2.0 in to the reality of fully integrated and operational extensions in version 2.1. And to get it all done in time to talk about at Smart Data was the icing on the cake.

Of course things are not stoping there. On the not too distant horizon are:

  • The final acceptance of bib.schema.org & auto.schema.org – currently they are in final review.
  • SchemaBibEx can now follow up this initial version of bib.schema.org with items from its backlog.
  • New extension proposals are already in the works such as: health.schema.org, archives.schema.org, fibo.schema.org.
  • More work on the software to improve the navigation and helpfulness of the site for those looking to understand and adopt Schema.org and/or the extensions.
  • The checking of the capability for the software to host external extensions without too much effort.
  • And of course the continuing list of proposals and fixes for the core vocabulary and the site itself.

I believe we are on the cusp of a significant step forward for Schema.org as it becomes ubiquitous across the web; more organisations, encouraged by extensions, prepare to publish their data; and the SEO community recognise  proof of it actually working – but more of that in the next post.

Comment below or Contact us