Schema.org – Extending Benefits






I find myself in New York for the day on my way back from the excellent Smart Data 2015 Conference in San Jose. It’s a long story about red-eye flights and significant weekend savings which I won’t bore you with, but it did result in some great chill-out time in Central Park to reflect on the week.

In its long auspicious history the SemTech, Semantic Tech & Business, and now Smart Data Conference has always attracted a good cross section of the best and brightest in Semantic Web, Linked Data, Web, and associated worlds. This year was no different. In my new role as an independent working with OCLC and at Google.

I was there on behalf of OCLC to review significant developments with Schema.org in general – now with 640 Types (Classes) & 988 properties – used on over 10 Million web sites.

I find myself in New York for the day on my way back from the excellent Smart Data 2015 Conference in San Jose. It’s a long story about red-eye flights and significant weekend savings which I won’t bore you with, but it did result in some great chill-out time in Central Park to reflect on the week.

In its long auspicious history the SemTech, Semantic Tech & Business, and now Smart Data Conference has always attracted a good cross section of the best and brightest in Semantic Web, Linked Data, Web, and associated worlds. This year was no different for me in my new role as an independent working with OCLC and at Google.

I was there on behalf of OCLC to review significant developments with Schema.org in general –  now with 640 Types (Classes) & 988 properties – used on over 10 Million web sites.  Plus the pioneering efforts OCLC are engaged with, publishing Schema.org data in volume from WorldCat.org and via APIs in their products.  Check out my slides:

By mining the 300+ million records in WorldCat to identify, describe, and publish approx. 200 million Work entity descriptions, and [soon to be shared] 90+ million Person entity descriptions, this pioneering continues.

These are not only significant steps forward for the bibliographic sector, but a great example of a pattern to be followed by most sectors:

  • Identify the entities in your data
  • Describe them well using Schema.org
  • Publish embedded in html
  • Work with, do not try to replace, the domain specific vocabularies – Bibframe in the library world
  • Work with the community to extend an enhance Schema.org to enable better representation of your resources
  • If Schema.org is still not broad enough for you, build an extension to it that solves your problems whilst still maintaining the significant benefits of sharing using Schema.org – in the library world’s case this was BiblioGraph.net

This has not been an overnight operation for OCLC. If you would like to read more about it, I can recommend the recently published Library Linked Data in the Cloud – Godby, Wang, Mixter.

Extending Schema.org
schemaorg1.jpgThrough OCLC and now Google I have been working with and around Schema.org since 2012. The presentation at Smart Data arrived at an opportune time to introduce and share some major developments with the vocabulary and the communities that surround it.

A couple of months back, Version 2.0 of Schema.org introduced the potential for extensions to the vocabulary. With Version 2.1, released the week before the conference, this potential became a reality with the introduction of bib.schema.org and auto.schema.org.

On a personal note the launch of these extensions, bib.schema.org in particular, is the culmination of a bit of a journey that started a couple of years ago with forming of the Schema Bib Extend W3C Community Group (SchemaBibEx) which had great success in proposing additions and changes to the core vocabulary.

A journey that then took in the formation of the BiblioGraph.net extension vocabulary which demonstrated both how to build a domain focused vocabulary on top of Schema.org as well as how the open source software, that powers the Schema.org site, could be forked for such an effort. These two laying the ground work for defining how hosted and external extensions will operate, and for SchemaBibex to be one of the first groups to propose a hosted extension.

Finally this last month working at Google with Dan Brickley on Schema.org, has been a bit of a blur as I brushed up my Python skills to turn the potential in version 2.0 in to the reality of fully integrated and operational extensions in version 2.1. And to get it all done in time to talk about at Smart Data was the icing on the cake.

Of course things are not stoping there. On the not too distant horizon are:

  • The final acceptance of bib.schema.org & auto.schema.org – currently they are in final review.
  • SchemaBibEx can now follow up this initial version of bib.schema.org with items from its backlog.
  • New extension proposals are already in the works such as: health.schema.org, archives.schema.org, fibo.schema.org.
  • More work on the software to improve the navigation and helpfulness of the site for those looking to understand and adopt Schema.org and/or the extensions.
  • The checking of the capability for the software to host external extensions without too much effort.
  • And of course the continuing list of proposals and fixes for the core vocabulary and the site itself.

I believe we are on the cusp of a significant step forward for Schema.org as it becomes ubiquitous across the web; more organisations, encouraged by extensions, prepare to publish their data; and the SEO community recognise  proof of it actually working – but more of that in the next post.

Schema.org 2.0






About a month ago Version 2.0 of the Schema.org vocabulary hit the streets. But does this warrant the version number clicking over from 1.xx to 2.0?






schema-org1 About a month ago Version 2.0 of the Schema.org vocabulary hit the streets.

This update includes loads of tweaks, additions and fixes that can be found in the release information.  The automotive folks have got new vocabulary for describing Cars including useful properties such as numberofAirbags, fuelEfficiency, and knownVehicleDamages. New property mainEntityOfPage (and its inverse, mainEntity) provide the ability to tell the search engine crawlers which thing a web page is really about.  With new type ScreeningEvent to support movie/video screenings, and a gtin12 property for Product, amongst others there is much useful stuff in there.

But does this warrant the version number clicking over from 1.xx to 2.0?

These new types and properties are only the tip of the 2.0 iceberg.  There is a heck of a lot of other stuff going on in this release that apart from these additions.  Some of it in the vocabulary itself, some of it in the potential, documentation, supporting software, and organisational processes around it.

Sticking with the vocabulary for the moment, there has been a bit of cleanup around property names. As the vocabulary has grown organically since its release in 2011, inconsistencies and conflicts between different proposals have been introduced.  So part of the 2.0 effort has included some rationalisation.  For instance the Code type is being superseded by SoftwareSourceCode – the term code has many different meanings many of which have nothing to do with software; surface has been superseded by artworkSurface and area is being superseded by serviceArea, for similar reasons. Check out the release information for full details.  If you are using any of the superseded terms there is no need to panic as the original terms are still valid but with updated descriptions to indicate that they have been superseded.  However you are encouraged to moved towards the updated terminology as convenient.  The question of what is in which version brings me to an enhancement to the supporting documentation.  Starting with Version 2.0 there will be published a snapshot view of the full vocabulary – here is http://schema.org/version/2.0.  So if you want to refer to a term at a particular version you now can.

CreativeWork_usage How often is Schema being used? – is a question often asked. A new feature has been introduced to give you some indication.  Checkout the description of one of the newly introduced properties mainEntityOfPage and you will see the following: ‘Usage: Fewer than 10 domains‘.  Unsurprisingly for a newly introduced property, there is virtually no usage of it yet.  If you look at the description for the type this term is used with, CreativeWork, you will see ‘Usage: Between 250,000 and 500,000 domains‘.  Not a direct answer to the question, but a good and useful indication of the popularity of particular term across the web.

Extensions
In the release information you will find the following cryptic reference: ‘Fix to #429: Implementation of new extension system.’

This refers to the introduction of the functionality, on the Schema.org site, to host extensions to the core vocabulary.  The motivation for this new approach to extending is explained thus:

Schema.org provides a core, basic vocabulary for describing the kind of entities the most common web applications need. There is often a need for more specialized and/or deeper vocabularies, that build upon the core. The extension mechanisms facilitate the creation of such additional vocabularies.
With most extensions, we expect that some small frequently used set of terms will be in core schema.org, with a long tail of more specialized terms in the extension.

As yet there are no extensions published.  However, there are some on the way.

As Chair of the Schema Bib Extend W3C Community Group I have been closely involved with a proposal by the group for an initial bibliographic extension (bib.schema.org) to Schema.org.  The proposal includes new Types for Chapter, Collection, Agent, Atlas, Newspaper & Thesis, CreativeWork properties to describe the relationship between translations, plus types & properties to describe comics.  I am also following the proposal’s progress through the system – a bit of a learning exercise for everyone.  Hopefully I can share the news in the none too distant future that bib will be one of the first released extensions.

W3C Community Group for Schema.org
A subtle change in the way the vocabulary, it’s proposals, extensions and direction can be followed and contributed to has also taken place.  The creation of the Schema.org Community Group has now provided an open forum for this.

So is 2.0 a bit of a milestone?  Yes taking all things together I believe it is. I get the feeling that Schema.org is maturing into the kind of vocabulary supported by a professional community that will add confidence to those using it and recommending that others should.

The role of Role in Schema.org






This post is about an unusual, but very useful, aspect of the Schema.org vocabulary — the Role type.






schema-org1 Schema.org is basically a simple vocabulary for describing stuff, on the web.  Embed it in your html and the search engines will pick it up as they crawl, and add it to their structured data knowledge graphs.  They even give you three formats to chose from — Microdata, RDFa, and JSON-LD — when doing the embedding.  I’m assuming, for this post, that the benefits of being part of the Knowledge Graphs that underpin so called Semantic Search, and hopefully triggering some Rich Snippet enhanced results display as a side benefit, are self evident.

The vocabulary itself is comparatively easy to apply once you get your head around it — find the appropriate Type (Person, CreativeWork, Place, Organization, etc.) for the thing you are describing, check out the properties in the documentation and code up the ones you have values for.  Ideally provide a URI (URL in Schema.org) for a property that references another thing, but if you don’t have one a simple string will do.

There are a few strangenesses, that hit you when you first delve into using the vocabulary.  For example, there is no problem in describing something that is of multiple types — a LocalBussiness is both an Organisation and a Place.  This post is about another unusual, but very useful, aspect of the vocabulary — the Role type.

At first look at the documentation, Role looks like a very simple type with a handful of properties.  On closer inspection, however, it doesn’t seem to fit in with the rest of the vocabulary.  That is because it is capable of fitting almost anywhere.  Anywhere there is a relationship between one type and another, that is.  It is a special case type that allows a relationship, say between a Person and an Organization, to be given extra attributes.  Some might term this as a form of annotation.

So what need is this satisfying you may ask.  It must be a significant need to cause the creation of a special case in the vocabulary.  Let me walk through a case, that is used in a Schema.org Blog post, to explain a need scenario and how Role satisfies that need.

Starting With American Football

Say you are describing members of an American Football Team.  Firstly you would describe the team using the SportsOrganization type, giving it a name, sport, etc. Using RDFa:

Then describe a player using a Person type, providing name, gender, etc.:

Now lets relate them together by adding an athlete relationship to the Person description:

1

Let’s take a look of the data structure we have created using Turtle – not a html markup syntax but an excellent way to visualise the data structures isolated from the html:

So we now have Chucker Roberts described as an athlete on the Touchline Gods team.  The obvious question then is how do we describe the position he plays in the team.  We could have extended the SportsOrganization type with a property for every position, but scaling that across every position for every team sport type would have soon ended up with far more properties than would have been sensible, and beyond the maintenance scope of a generic vocabulary such as Schema.org.

This is where Role comes in handy.  Regardless of the range defined for any property in Schema.org, it is acceptable to provide a Role as a value.  The convention then is to use a property with the same property name, that the Role is a value for, to then remake the connection to the referenced thing (in this case the Person).  In simple terms we have have just inserted a Role type between the original two descriptions.

2

This indirection has not added much you might initially think, but Role has some properties of its own (startDate, endDate, roleName) that can help us qualify the relationship between the SportsOrganization and the athlete (Person).  For the field of organizations there is a subtype of Role (OrganizationRole) which allows the relationship to be qualified slightly more.

3

RDFa:

and in Turtle:

Beyond American Football

So far I have just been stepping through the example provided in the Schema.org blog post on this.  Let’s take a look at an example from another domain – the one I spend my life immersed in – libraries.

There are many relationships between creative works that libraries curate and describe (books, articles, theses, manuscripts, etc.) and people & organisations that are not covered adequately by the properties available (author, illustrator,  contributor, publisher, character, etc.) in CreativeWork and its subtypes.  By using Role, in the same way as in the sports example above,  we have the flexibility to describe what is needed.

Take a book (How to be Orange: an alternative Dutch assimilation course) authored by Gregory Scott Shapiro, that has a preface written by Floor de Goede. As there is no writerOfPreface property we can use, the best we could do is to is to put Floor de Goede in as a contributor.  However by using Role can qualify the contribution role that he played to be that of the writer of preface.

4

In Turtle:

and RDFa:

You will note in this example I have made use of URLs, to external resources – VIAF for defining the Persons and the Library of Congress relator codes – instead of defining them myself as strings.  I have also linked the book to it’s Work definition so that someone exploring the data can discover other editions of the same work.

Do I always use Role?
In the above example I relate a book to two people, the author and the writer of preface.  I could have linked to the author via another role with the roleName being ‘Author’ or <http://id.loc.gov/vocabulary/relators/aut>.  Although possible, it is not a recommended approach.  Wherever possible use the properties defined for a type.  This is what data consumers such as search engines are going to be initially looking for.

One last example

To demonstrate the flexibility of using the Role type here is the markup that shows a small diversion in my early career:

This demonstrates the ability of Role to be used to provide added information about most relationships between entities, in this case the employee relationship. Often Role itself is sufficient, with the ability for the vocabulary to be extended with subtypes of Role to provide further use-case specific properties added.

Whenever possible use URLs for roleName
In the above example, it is exceedingly unlikely that there is a citeable definition on the web, I could link to for the roleName. So it is perfectly acceptable to just use the string “Keyboards Roadie”.  However to help the search engines understand unambiguously what role you are describing, it is always better to use a URL.  If you can’t find one, for example in the Library of Congress Relater Codes, or in Wikidata, consider creating one yourself in Wikipedia or Wikidata for others to share. Another spin-off benefit for using URIs (URLs) is that they are language independent, regardless of the language of the labels in the data  the URI always means the same thing.  Sources like Wikidata often have names and descriptions for things defined in multiple languages, which can be useful in itself.

Final advice
This very flexible mechanism has many potential uses when describing your resources in Schema.org. There is always a danger in over using useful techniques such as this. Be sure that there is not already a way within Schema, or worth proposing to those that look after the vocabulary, before using it.

Good luck in your role in describing your resources and the relationships between them using Schema.org