Schema.org 2.0






About a month ago Version 2.0 of the Schema.org vocabulary hit the streets. But does this warrant the version number clicking over from 1.xx to 2.0?

schema-org1 About a month ago Version 2.0 of the Schema.org vocabulary hit the streets.

This update includes loads of tweaks, additions and fixes that can be found in the release information.  The automotive folks have got new vocabulary for describing Cars including useful properties such as numberofAirbags, fuelEfficiency, and knownVehicleDamages. New property mainEntityOfPage (and its inverse, mainEntity) provide the ability to tell the search engine crawlers which thing a web page is really about.  With new type ScreeningEvent to support movie/video screenings, and a gtin12 property for Product, amongst others there is much useful stuff in there.

But does this warrant the version number clicking over from 1.xx to 2.0?

These new types and properties are only the tip of the 2.0 iceberg.  There is a heck of a lot of other stuff going on in this release that apart from these additions.  Some of it in the vocabulary itself, some of it in the potential, documentation, supporting software, and organisational processes around it.

Sticking with the vocabulary for the moment, there has been a bit of cleanup around property names. As the vocabulary has grown organically since its release in 2011, inconsistencies and conflicts between different proposals have been introduced.  So part of the 2.0 effort has included some rationalisation.  For instance the Code type is being superseded by SoftwareSourceCode – the term code has many different meanings many of which have nothing to do with software; surface has been superseded by artworkSurface and area is being superseded by serviceArea, for similar reasons. Check out the release information for full details.  If you are using any of the superseded terms there is no need to panic as the original terms are still valid but with updated descriptions to indicate that they have been superseded.  However you are encouraged to moved towards the updated terminology as convenient.  The question of what is in which version brings me to an enhancement to the supporting documentation.  Starting with Version 2.0 there will be published a snapshot view of the full vocabulary – here is http://schema.org/version/2.0.  So if you want to refer to a term at a particular version you now can.

CreativeWork_usage How often is Schema being used? – is a question often asked. A new feature has been introduced to give you some indication.  Checkout the description of one of the newly introduced properties mainEntityOfPage and you will see the following: ‘Usage: Fewer than 10 domains‘.  Unsurprisingly for a newly introduced property, there is virtually no usage of it yet.  If you look at the description for the type this term is used with, CreativeWork, you will see ‘Usage: Between 250,000 and 500,000 domains‘.  Not a direct answer to the question, but a good and useful indication of the popularity of particular term across the web.

Extensions
In the release information you will find the following cryptic reference: ‘Fix to #429: Implementation of new extension system.’

This refers to the introduction of the functionality, on the Schema.org site, to host extensions to the core vocabulary.  The motivation for this new approach to extending is explained thus:

Schema.org provides a core, basic vocabulary for describing the kind of entities the most common web applications need. There is often a need for more specialized and/or deeper vocabularies, that build upon the core. The extension mechanisms facilitate the creation of such additional vocabularies.
With most extensions, we expect that some small frequently used set of terms will be in core schema.org, with a long tail of more specialized terms in the extension.

As yet there are no extensions published.  However, there are some on the way.

As Chair of the Schema Bib Extend W3C Community Group I have been closely involved with a proposal by the group for an initial bibliographic extension (bib.schema.org) to Schema.org.  The proposal includes new Types for Chapter, Collection, Agent, Atlas, Newspaper & Thesis, CreativeWork properties to describe the relationship between translations, plus types & properties to describe comics.  I am also following the proposal’s progress through the system – a bit of a learning exercise for everyone.  Hopefully I can share the news in the none too distant future that bib will be one of the first released extensions.

W3C Community Group for Schema.org
A subtle change in the way the vocabulary, it’s proposals, extensions and direction can be followed and contributed to has also taken place.  The creation of the Schema.org Community Group has now provided an open forum for this.

So is 2.0 a bit of a milestone?  Yes taking all things together I believe it is. I get the feeling that Schema.org is maturing into the kind of vocabulary supported by a professional community that will add confidence to those using it and recommending that others should.

W3C Library Linked Data Final Report Published

w3c_home The W3C Library Linked Data Incubator Group has published it’s Final Report after a year of deliberation.

The mission of the Library Linked Data Incubator Group was to help
increase the global interoperability of library data on the Web by
focusing on the potential role of Linked Data technologies.

This report contains several messages that are not just interesting and relevant for the Linked Data enthusiast in the library community. It contains some home truths for those in libraries who think that a slight tweak to the status quo, such as adopting RDA, will be sufficient to keep libraries [data] relevant in the rapidly evolving world of the web.

On the NGC4LIB mailing list, Eric Lease Morgan picked out some useful quotes from the report:

  • Linked Data is not about creating a different Web, but rather about enhancing the Web through the addition of structured data.
  • By promoting a bottom-up approach to publishing data, Linked Data creates an opportunity for libraries to improve the value proposition of describing their assets.
  • Linked Data may be a first step toward a “cloud-based” approach to managing cultural information, which could be more cost-effective than stand-alone systems in institutions.
  • With Linked Open Data, libraries can increase their presence on the Web, where most information seekers can be found.
  • The use of the Web and Web-based identifiers will make up-to-date resource descriptions directly citable by catalogers.
  • History shows that all technologies are transitory, and the history of information technology suggests that specific data formats are especially short-lived.
  • Library developers and vendors will directly benefit from not being tied to library-specific data formats.
  • Most information in library data is encoded as display-oriented, natural-language text.
  • Work on library Linked Data can be hampered by the disparity in concepts and terminology between libraries and the Semantic Web community.
  • Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data.
  • A major advantage of Linked Data technology is realized with the establishment of connections between and across datasets.
  • Libraries should embrace the web of information, both by making their data available for use as Linked Data and by using the web of data in library services. Ideally, library data should integrate fully with other resources on the Web, creating greater visibility for libraries and bringing library services to information seekers.

Also, from the report:

Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data — information which could be put to effective use in environments where data is integrated seamlessly across contexts. Pioneering initiatives such as the release of the British National Bibliography reveal the effort required to address challenges such as licensing, data modeling, the handling of legacy data, and collaboration with multiple user communities. However, these also demonstrate the considerable benefits of releasing bibliographic databases as Linked Data. As the community’s experience increases, the number of datasets released as Linked Data is growing rapidly.

Talis Consulting has been closely and actively involved in the modelling, data transformation, publishing, and hosting of the British National Bibliography (BnB) as Linked Data.  A great overview of the approach taken to modelling of bibliographic data in a way that makes it easily compatible with the wider Web of Data, is provided by Tim Hodson in his post – British Library Data Model: Overview.  As can bee seen from their work, the modelling used for the BnB differs from the approach taken by many attempting to publish bibliographic data as Linked Data – it describes the resources (the books, authors, publishers, etc.)  as people, places, events, and things, as against attempting to represent the records that libraries keep about their stock of resources.

With intentions to release open library data specifically mentioning Linked Data, the sentiments from this report are already influencing the wider forward thinking library community.  I will leave the last word to the report’s final paragraph which some, in the traditional record-based cataloguing community, may have difficulty in getting their head around.  I encourage them to look at libraries from the point of view of the wider [non-library] web consumers, and read it again.

One final caveat: data consumers should bear in mind that, in contrast to traditional, closed IT systems, Linked Data follows an open-world assumption: the assumption that data cannot generally be assumed to be complete and that, in principle, more data may become available for any given entity. We hope that more “data linking” will happen in the library domain in line with the projects mentioned here.

This post was also published on the Talis Consulting Blog