Firstly let me credit the source that alerted me to the subject of this post.

Jennifer Slegg of TheSEMPost wrote an article last week entitled Adding Structured Data Helps Google Understand & Rank Webpages Better. It was her report of a conversation with Google’s Webmaster Trends Analyst Gary Illyes at PUBCON Las Vegas 2017.

Jennifer’s report included some quotes from Gary which go a long way towards clarifying the power, relevance, and importance for Google for embedding Schema.org Structured Data in web pages.

Those that have encountered my evangelism for doing just that, will know that there have been many assumptions about the potential effects of adding Schema.org Structured Data to your HTML, but this is the first confirmation of those assumptions by Google folks that I am aware of.

To my mind there are two key quotes from Garry, firstly:

But more importantly, add structure data to your pages because during indexing, we will be able to better understand what your site is about.

In the early [only useful for Rich Snippets] days of Schema.org, Google representatives went out of their way to assert that adding Schema.org to a page would NOT influence its position in search results.  More recently, ‘non-commital’ could be described as the answer to questions about Schema.org and indexing.    Gary’s phrasing is a little interesting “during indexing, we will be able to better understand“, but you can really only drawn certain conclutions from them.

So, this answers one of the main questions I am asked by those looking to me for help in understanding and applying Schema.org

If I go to the trouble of adding Schema.org to my pages, will Google [and others] take any notice?”  To paraphrase Mr Illyes — Yes.

The second key quote:

And don’t just think about the structured data that we documented on developers.google.com. Think about any schema.org schema that you could use on your pages. It will help us understand your pages better, and indirectly, it leads to better ranks in some sense, because we can rank easier.

So structured data is important, take any schema from schema.org and implement it, as it will help.

This answers directly another common challenge I get when recommending the use of the whole Schema.org vocabulary, and its extensions, as a source of potential Structured Data types for marking up your pages.

The challenge being “where is the evidence that any schema, not documented in the Google Developers Structured Data Pages, will be taken notice of?

So thanks Gary, you have just made my job, and the understanding of those that I help, a heck of a lot easier.

Appart from those two key points there are some other interesting takeaways from his session as reported by Jennifer.

Their recent increased emphasis on things Structured Data:

We launched a bunch of search features that are based on structured data. It was badges on image search, jobs was another thing, job search, recipes, movies, local restaurants, courses and a bunch of other things that rely solely on structure data, schema.org annotations.

It is almost like we started building lots of new features that rely on structured data, kind of like we started caring more and more and more about structured data. That is an important hint for you if you want your sites to appear in search features, implement structured data.

Google’s further increased Structured Data emphasis in the near future:

Next year, there will be two things we want to focus on. The first is structured data. You can expect more applications for structured data, more stuff like jobs, like recipes, like products, etc.

For those who have been sceptical as to the current commitment of Google and others to Schema.org and Structured Data, this should go some way towards settling your concerns.

It is at his point I add in my usual warning against rushing off and liberally sprinkling Schema.org terms across your web pages.  It is not like keywords.

The search engines are looking for structured descriptions (the clue is in the name) of the Things (entities) that your pages are describing; the properties of those things; and the relationships between those things and other entities.

Behind Schema.org and Structured Data are some well established Linked Data principles, and to get the most effect from your efforts, it is worth recognising them.  

Applying Structured Data to your site is not rocket science, but it does need a little thought and planning to get it right.   With organisatons such as Google taking notice, like most things in life, it is worth doing right if you are going to do it at all.

 

Comment   or   Contact us

The latest release of Schema.org (v3.3 August 2017) included enhancements proposed by The Tourism Structured Web Data Community Group.

Although fairly small, these enhancements have significantly improved the capability for describing Tourist Attractions and hopefully enabling more tourist discoveries like the one pictured here.

The TouristAttraction Type

The TouristAttraction type has been around, as a subtype of Place, from the earliest days back in 2011.  However it did not have any specific properties of its own or examples for its use.  For those interested in promoting and sharing data to help with tourism and the discovery of tourist attractions, the situation was a bit limiting.

As a result of the efforts by the group, TouristAttraction now has two properties — availableLanguage (A language someone may use with or at the item, service or place.) and touristType (Attraction suitable for type(s) of tourist. eg. Children, visitors from a particular country, etc.).   So now we can say, in data, for example that an attraction is suitable for Spanish & French speaking tourists who are interested in wine.

Application and Examples

At initial view the addition of a couple of general purpose properties does not seem much of an advancement.  However the set of examples, that the Group has provided, demonstrate the power and flexibility of this enhanced type for use with tourism.

The principle behind their approach, as demonstrated in the examples, is that most anything can be of interest to a tourist — a beach, mountain, costal highway, ancient building, work of art, war cemetery, winery, amusement park — the list is endless.  It was soon clear that it would not be practical to add tourist relevant properties to all such relevant types within Schema.org.

Multi-Typed Entities (MTEs)
The Multi-Type Entity is a powerful feature of Schema.org which has often caused confusion and has a bit of a reputation for being complex.  When describing a thing (an entity in data speak) Schema.org has the capability to indicate that it is of more than one type.

For example you could be describing a Book, with an author, subject, isbn, etc., and in order to represent a physical example of that book you can also say it is a Product with weight, width, purchaseDate, itemCondition, etc.  In Schema.org you simply achieve that by indicating in your mark-up that the thing you are describing is both a Book and a Product.

Utilising the MTE principle for tourism, you would firstly describe the thing itself, using the appropriate Schema.org Types (Mountain, Church, Beach, AmusementPark, etc.).  Having done that you then add the TouristAttraction type. 

In Microdata it would look like this:
  <div itemtype=“http://schema.org/Cemetery” itemscope>
    <link itemprop=“additionalType” href=“http://schema.org/TouristAttraction” />
    <meta itemprop=“name” content=“Villers–Bretonneux Australian National Memorial />

In JSON-LD:
<script type=“application/ld+json”>
{
“@context”: “http://schema.org/”,
 “@type”: [“Cemetery”,“TouristAttraction”],
 “name”: “Villers–Bretonneux Australian National Memorial”,

If there is no specific Schema.org type for the thing you are describing, you would just identify it as being a TouristAttraction, using all the properties inherited from Place to describe it as best as you can.

Following the ‘anything can be a tourist attraction principle’ it is perfectly possible for a Person to also be a TouristAttraction.  Take the human statue street performer who dresses up as the Statue of Liberty and stands in Times Square New York on most days (or at least seems to be there every time I visit).  

Using Schema.org he/she can be described as a Person with name, email, jobTitle (Human Statue of Liberty), etc., and in addition as a TouristAttraction.  Sharing this information in Schema.org on a page describing this performer would possibly increase their discoverability to those looking for tourist things in New York.

Public Access

In addition to the properties and examples for TouristAttraction, the Group also proposed a new publicAccess property.  It was felt that this had wider benefit than just for tourism and hence it was added to Place, and hence inherited by TouristAttraction.  publicAccess is a boolean property enabling you to state if a Place is or is not accessible to the public.  There is no assumed default value for this property if omitted from markup.

 

(Tourist image by Lain
Statue image by Bruce Crummy)

 

 

Comment   or   Contact us

There have been  discussions in Schema.org Github Issues about the way Organizations their offices, branches and other locations can be marked up.  (The trail is here: #1734)  It started with a question about why the property hasMap was available for a LocalBusiness but not for Organization.  The simple answer being that LocalBusiness inherits the hasMap property from Place, one of its super-types, and not from Organiszation, its other super-type.

As was commented in the thread…

there are plenty of organizations/corporations that have a head office which nobody would consider a LocalBusiness yet for which it would be handy to be able to express it can be found on a map.

The following discussion exposed a lack of clarity in the way to structure descriptions of Organizations and their locations, offices, branches , etc.  It is also clear that the current structure when applied correctly can deliver the functionality required.  This is not to recognise that the descriptions of terms and associated examples could not be improved, and may be helped by some tweaking to the current structure — the discussion continues!

To address that lack of clarity I thought it would be useful to share some examples here.

The LocalBusiness simple case
As a combination of Organization and Place, LocalBusiness gives you most anything you would need to describe a local shop, repair garage, café, etc.  Thus:
<script type=“application/ld+json”>
{
  “@context”: “http://schema.org”,
  “@type”: “LocalBusiness”,
“address”: {
    “@type”: “PostalAddress”,
    “addressLocality”: “Mexico Beach”,
    “addressRegion”: “FL”,
    “streetAddress”: “3102 Highway 98”
  },
  “description”: “A superb collection of fine gifts and clothing to accent your stay in Mexico Beach.”,
  “name”: “Beachwalk Beachwear & Giftware”,
  “telephone”: “850-648-4200”
}
</script>
There are plenty of other properties available, so for example if you have link to a map you could add:
“hasMap”: “https://www.google.co.uk/maps/place/Beachwalk/@29.94904,-85.4207543,17z”,

If you had other information about the business such as business numbers or tax identifiers you could add:
“vatID”: “1234567890abc”,

What about a group of LocalBusinesses
Also this is a fairly simple case. By using the parentOrganization & subOrganization properties (inherited from the Organiztion super-type) you can build a hierarchy of relationships as complex as you would ever need:
<script type=“application/ld+json”>
{
  “@context”: “http://schema.org”,
  “@type”: “LocalBusiness”,
  “@id”: “http://localshops.example.com/mainBranch”,
  “name”: “Localshops”,
  “subOrganization”: “http://localshops.example.com/cityBranch”
}
</script>
<script type=“application/ld+json”>
{
  “@context”: “http://schema.org”,
  “@type”: “LocalBusiness”,
  “@id”: “http://localshops.example.com/cityBranch”,
  “name”: “Localshops”,
  “parentOrganization”: “http://localshops.example.com/mainBranch”
}
</script>

Location and POS
There are a couple of properties inherited from Organization (location, hasPOS) which may help you link to things that might not be obvious local businesses — the warehouse location for your group of hardware stores, or the beach kiosk for your café for example.


Banks, Libraries, and Hotels
There are many local examples of larger organizations that appear on our high streets, for some of the more common ones there are ready made subtypes of LocalBusiness – BankOrCreditUnion, Library, Hotel, etc.  If you can’t find a suitable one, default to LocalBusiness.

This possibility sometimes causes some confusion.  For instance Wells Fargo, the global finance company, could no way be considered as a local business, however their branch in your local city can indeed be considered as one.  For example:
{
  “@context”: “http://schema.org”,
  “@type”: “BankOrCreditUnion”,
  “name”: “Wells Fargo – Smalltown Branch”,
  “parentOrganization”: “http://wellsfargo.com”
}

Governments, Global Corporations, Online Groups
Picking up on that confusion about non-LocalBusiness-ness, brings me to the prime use of the Organization type.  That prime use in my experience is to describe the legal entity, corporate organiation, cooperation group, government, international body, etc.  From a dictionary definition: “an organized group of people with a particular purpose, such as a business or government department.”

My approach to this would be to describe the organization first — name, description, email, foundingDate, leicode, logo, taxID, etc.  If it has a registered or main address, use the address property, if it has channels for specific contact methods etc, use contactPoint to describe areaServed, contactType etc.

Some of these organizations operate out of various locations — head offices, regional/country based main/local offices, factories, warehouses, distribution centres, research and development centres, laboratories, etc.  The list is a substantial one.  This is where the location property comes into play.

Each of those locations can then be described using a Place type or one of its subtypes…

<script type=“application/ld+json”>
{
  “@context”: “http://schema.org”,
  “@type”: “Organization”,
  “@id”: “http://global-corp.com”,
  “name”: “Global-Corp Company”,
  “location”:  [
   {
    “@type”: “Place”,
    “name”: “Global-Corp Company HQ”,
    “address”: “Future City Development, Main Street, Anytown, NY”,
    “hasMap”: “https://www.google.co.uk/maps/place/Anytown”
  },
  {
    “@type”: “Place”,
    “name”: “Global-Corp Company Research Laboratory”,
    “address”: “Lab1, Future Park, Anytown, NY”,
    “hasMap”: “https://www.google.co.uk/maps/place/Anytown”
  }
 ]
}
</script>

This is not a perfect solution, it would be nice to have specific subtypes of Place for Office, Factory, etc. or a placeType property on Place.  I am sure there will be continued discussion on this.   In the meantime what is already available goes a long way towards describing what is needed from Global Corporations to Local Cafés and everything in-between.

(Image: Simon)
Comment   or   Contact us

A theme of my last few years has been enabling the increased discoverability of Cultural Heritage resources by making the metadata about them more open and consumable.

Much of this work has been at the libraries end of the sector but I have always have had an eye on the broad Libraries, Archives, and Museums world, not forgetting Galleries of course.

Two years ago at the LODLAM Summit 2015 I ran a session to explore if it would be possible to duplicate in some way the efforts of the Schema Bib Extend W3C Community Group which proposed and introduced an extension and enhancements to the Schema.org vocabulary to improve its capability for describing bibliographic resources, but this time for archives physical, digital and web.

Interest was sufficient for me to setup and chair a new W3C Community Group, Schema Architypes. The main activity of the group has been the creation and discussion around a straw-man proposal for adding new types to the Schema.org vocabulary.

Not least the discussion has been focused on how the concepts from the world of archives (collections, fonds, etc.) can be represented by taking advantage of the many modelling patterns and terms that are already established in that generic vocabulary, and what few things would need to be added to expose archive metadata to aid discovery.

Coming up next week is LODLAM Summit 2017, where I have proposed a session to further the discussion on the proposal.

So why am I now suggesting that there maybe an opportunity for the discovery of archives and their resources?

In web terms for something to be discoverable it, or a description of it, needs to be visible on a web page somewhere. To take advantage of the current structured web data revolution, being driven by the search engines and their knowledge graphs they are building, those pages should contain structured metadata in the form of Schema.org markup.

Through initiatives such as ArchivesSpace and their application, and ArcLight it is clear that many in the world of archives have been focused on web based management, search, and delivery views of archives and the resources and references they hold and describe. As these are maturing it is clear that the need for visibility on the web is starting to be addressed.

So archives are now in a great place to grab the opportunity to take advantage of the benefits of Schema.org to aid discovery of their archives and what they contain. At least with these projects, they have the pages on which to embed that structured web data, once a consensus around the proposals from the Schema Architypes Group has formed.

I call out to those involved with the practical application of systems for the management, searching, and delivery of archives to at least take a look at the work of the group and possibly engage on a practical basis, exploring the potential and challenges for implementing Schema.org.

So if you want to understand more behind this opportunity, and how you might get involved, either join the W3C Group or contact me direct.

 

*Image acknowledgement to The Oberlin College Archives

Comment   or   Contact us

Schema.org squarePart of my efforts working with Google in support of the Schema.org structured web data vocabulary, its extensions, usage and implementation, is to introduce new functionality and facilities on to the Schema.org site.

I have recently concluded a piece of work to improve accessibility to the underlying definition of vocabulary terms in various data formats, which is now available for testing and comment. It is available in a prerelease of the next Schema.org version hosted at webschemas.org – the Schema.org test site.  It is the kind of functionality could benefit from as many eyes as possible to check and comment.

Introduced in the current release (3.1) was the ability to download vocabulary definition files for the core vocabulary and its extensions in triples, quads, json-ld and turtle formats.  Building upon that, the new version, provisionally numbered 3.2, introduces download files in RDF/XML, plus CSV format for vocabulary Types, Properties and Enumeration values.

Book-ttlThere have been requests to make other output formats available for individual terms in addition to the embedded RDFa already available.  In response to these requests, we have introduced per term outputs in csv, triples, json-ld, rdf/xml and turtle format. 

These can be accessed in two ways. Firstly by adding the appropriate suffix (.csv, .nt, .jsonld, .rdf, .ttl) to the term URI:

They are also available using content-negotiation, supplying the appropriate value for Accept in the httpd request header (text/csv, text/plain, application/ld+json, application/rdf+xml, application/x-turtle).

So if you are interested in such things, take a look and let us know what you think about the contents, formats, and accuracy of these outputs.

Comment   or   Contact us

I spend a significant amount of time working with Google folks, especially Dan Brickley, and others on the supporting software, vocabulary contents, and application of Schema.org.  So it is with great pleasure, and a certain amount of relief, I share the announcement of the release of 3.1.

That announcement lists several improvements, enhancements and additions to the vocabulary that appeared in versions 3.0 & 3.1. These include:

  • Health Terms – A significant reorganisation of the extensive collection of medical/health terms, that were introduced back in 2012, into the ‘health-lifesci’ extension, which now contains 99 Types, 179 Properties and 149 Enumeration values.
  • Finance Terms – Following an initiative and work by Financial Industry Business Ontology (FIBO) project (which I have the pleasure to be part of), in support of the W3C Financial Industry Business Ontology Community Group, several terms to improve the capability for describing things such as banks, bank accounts, financial products such as loans, and monetary amounts.
  • Spatial and Temporal and DatasetsCreativeWork now includes spatialCoverage and temporalCoverage which I know my cultural heritage colleagues and clients will find very useful.  Like many enhancements in the Schema.org community, this work came out of a parallel interest, in which  Dataset has received some attention.
  • Hotels and Accommodation – Substantial new vocabulary for describing hotels and accommodation has been added, and documented.
  • Pending Extension – Introduced in version 3.0 a special extension called “pending“, which provides a place for newly proposed schema.org terms to be documented, tested and revised.  The anticipation being that this area will be updated with proposals relatively frequently, in between formal Schema.org releases.
  • How We Work – A HowWeWork document has been added to the site. This comprehensive document details the many aspects of the operation of the community, the site, the vocabulary etc. – a useful way in for casual users through to those who want immerse themselves in the vocabulary its use and development.

For fuller details on what is in 3.1 and other releases, checkout the Releases document.

Hidden Gems

Often working in the depths of the vocabulary, and the site that supports it, I get up close to improvements that on the surface are not obvious which some [of those that immerse themselves] may find interesting that I would like to share:

  • Snappy Performance – The Schema.org site, a Python app hosted on the Google App Engine, is shall we say a very popular site.  Over the last 3-4 releases I have been working on taking full advantage of muti-threaded, multi-instance, memcache, and shared datastore capabilities. Add in page caching imrovements plus an implementation of Etags, and we can see improved site performance which can be best described as snappiness. The only downsides being, to see a new version update you sometimes have to hard reload your browser page, and I have learnt far more about these technologies than I ever thought I would need!
  • Data Downloads – We are often asked for a copy of the latest version of the vocabulary so that people can examine it, develop form it, build tools on it, or whatever takes their fancy.  This has been partially possible in the past, but now we have introduced (on a developers page we hope to expand with other useful stuff in the future – suggestions welcome) a download area for vocabulary definition files.  From here you can download, in your favourite format (Triples, Quads, JSON-LD, Turtle), files containing the core vocabulary, individual extensions, or the whole vocabulary.  (Tip: The page displays the link to the file that will always return the latest version.)
  • Data Model Documentation – Version 3.1 introduced updated contents to the Data Model documentation page, especially in the area of conformance.  I know from working with colleagues and clients, that it is sometimes difficult to get your head around Schema.org’s use of Multi-Typed Entities (MTEs) and the ability to use a Text, or a URL, or Role for any property value.  It is good to now have somewhere to point people when they question such things.
  • Markdown – This is a great addition for those enhancing, developing and proposing updates to the vocabulary.  The rdfs:comment section of term definitions are now passed through a Markdown processor.  This means that any formatting or links to be embedded in term description do not have to be escaped with horrible coding such as &amp; and &gt; etc.  So for example a link can be input as [The Link](http://example.com/mypage) and italic text would be input as *italic*.  The processor also supports WikiLinks style links, which enables the direct linking to a page within the site so [[CreativeWork]] will result in the user being taken directly to the CreativeWork page via a correctly formatted link.   This makes the correct formatting of type descriptions a much nicer experience, as it does my debugging of the definition files. Winking smile

I could go on, but won’t  – If you are new to Schema.org, or very familiar, I suggest you take a look.

Comment   or   Contact us

cogsbrowser Marketing Hype! I hear you thinking – well at least I didn’t use the tired old ‘Next Generation’ label.

Let me explain what is this fundamental component of what I am seeing potentially as a New Web, and what I mean by New Web.

This fundamental component I am talking about you might be surprised to learn is a vocabulary – Schema.org.  But let me first set the context by explaining my thoughts on this New Web.

Having once been considered an expert on Web 2.0 (I hasten to add by others, not myself) I know how dangerous it can be to attach labels to things.  It tends to spawn screen full’s of passionate opinions on the relevance of the name, date of the revolution, and over detailed analysis of isolated parts of what is a general movement.   I know I am on dangerous ground here!

To my mind something is new when it feels different.  The Internet felt different when the Web (aka HTTP + HTML + browsers) arrived.  The Web felt different (Web 2.0?) when it became more immersive (write as well as read) and visually we stopped trying to emulate in a graphical style what we saw on character terminals. Oh, and yes we started to round our corners. 

There have been many times over the last few years when it felt new – when it suddenly arrived in our pockets (the mobile web) – when the inner thoughts, and eating habits, of more friends that you ever remember meeting became of apparent headline importance (the social web) – when [the contents of] the web broke out of the boundaries of the browser and appeared embedded in every app, TV show, and voice activated device.

The feeling different phase I think we are going through at the moment, like previous times, is building on what went before.  It is exemplified by information [data] breaking out of the boundaries of our web sites and appearing where it is useful for the user. 

library_of_congress We are seeing the tip of this iceberg in the search engine Knowledge Panels, answer boxes, and rich snippets,ebay-rich-snippet   The effect of this being that often your potential user can get what they need without having to find and visit your site – answering questions such as what is the customer service phone number for an organisation; is the local branch open at the moment; give me driving directions to it; what is available and on offer.  Increasingly these interactions can occur without the user even being aware they are using the web – “Siri! Where is my nearest library?
A great way to build relationships with your customers. However a new and interesting challenge for those trying to measure the impact of your web site.

So, what is fundamental to this New Web

There are several things – HTTP, the light-weight protocol designed to transfer text, links and latterly data, across an internet previously used to specific protocols for specific purposes – HTML, that open, standard, easily copied light-weight extensible generic format for describing web pages that all browsers can understand – Microdata, RDFa, JSON, JSON-LD – open standards for easily embedding data into HTML – RDF, an open data format for describing things of any sort, in the form of triples, using shared vocabularies.  Building upon those is Schema.org – an open, [de facto] standard, generic vocabulary for describing things in most areas of interest. 

icon-LOVWhy is one vocabulary fundamental when there are so many others to choose from? Check out the 500+ referenced on the  Linked Open Vocabularies (LOV) site.  Schema.org however differs from most of the others in a few key areas:Schema.org square 

  • Size and scope – its current 642 Types and 992 Properties is significantly larger and covers far more domains of interest than most others.  This means that if you are looking to describe a something, you are highly likely to to find enough to at least start.  Despite its size, it is yet far from capable of describing everything on, or off, the planet.
  • Adoption – it is estimated to be in use on over 12 million sites.  A sample of 10 billion pages showed over 30% containing Schema.org markup.  Checkout this article for more detail: Schema.org: Evolution of Structured Data on the Web – Big data makes common schemas even more necessary. By Guha, Brickley and Macbeth.
  • Evolution – it is under continuous evolutionary development and extension, driven and guided by an open community under the wing of the W3C and accessible in a GitHub repository.
  • Flexibility – from the beginning Schema.org was designed to be used in a choice of your favourite serialisation – Microdata, RDFa, JSON-LD, with the flexibility of allowing values to default to text if you have not got a URI available.
  • Consumers – The major search engines Google, Bing, Yahoo!, and Yandex, not only back the open initiative behind Schema.org but actively search out Schema.org markup to add to their Knowledge Graphs when crawling your sites.
  • Guidance – If you search out guidance on supplying structured data to those major search engines, you are soon supplied with recommendations and examples for using Schema.org, such as this from Google.  They even supply testing tools for you to validate your markup.

With this support and adoption, the Schema.org initiative has become self-fulfilling.  If your objective is to share or market structured data about your site, organisation, resources, and or products with the wider world; it would be difficult to come up with a good reason not to use Schema.org.

Is it a fully ontologically correct semantic web vocabulary? Although you can see many semantic web and linked data principles within it, no it is not.  That is not its objective. It is a pragmatic compromise between such things, and the general needs of webmasters with ambitions to have their resources become an authoritative part of the global knowledge graphs, that are emerging as key to the future of the development of search engines and the web they inhabit.

Note that I question if Schema.org is a fundamental component, of what I am feeling is a New Web. It is not the fundamental component, but  one of many that over time will become just the way we do things

Comment   or   Contact us

sq-extending schema In this third part of the series I am going to concentrate less on the science of working with the technology of Schema.org and more on what you might call the art of extension.

It builds on the previous two posts The Bits and Pieces which introduces you to the mechanics of working with the Schema.org repository in GitHub and your own local version; and Working Within the Vocabulary which takes you through the anatomy of the major controlling files for the terms and their examples, that you find in the repository.

Art maybe an over ambitious word for the process that I am going to try and describe. However it is not about rules, required patterns, syntaxes, and file formats – the science; it is about general guidelines, emerging styles & practices, and what feels right.  So art it is.

OK. You have read the previous posts in this series. You have said to yourself I only wish that I could describe [insert you favourite issue here] in Schema.org. You are now inspired to do something about it, or get together with a community of colleagues to address the usefulness of Schema.org for your area of interest.  Then comes the inevitable question…

Where do I focus my efforts – the core vocabulary or a Hosted Extension or an External Extension?

Firstly a bit of background to help answer that question.

The core of the Schema.org vocabulary has evolved since its launch by Google, Bing, and Yahoo! (soon joined by Yandex), in June 2011. By the end of 2015 its term definitions had reached 642 types and 992 properties.  They cover many many sectors commercial, and not, including sport, media, retail, libraries, local businesses, heath, audio, video, TV, movies, reviews, ratings, products, services, offers and actions.  Its generic nature has facilitated is spread of adoption across well over 10 million sites.  For more background I recommend the December 2015 article Schema.org: Evolution of Structured Data on the Web – Big data makes common schemas even more necessary. By Guha, Brickley and Macbeth.

That generic nature however does introduce issues for those in specific sectors wishing to focus in more detail on the entities and relationships specific to their domain whist still being part of, or closely related to, Schema.org.  In the spring of 2015 an Extension Mechanism, consisting of Hosted and External extensions, was introduced to address this.

Reviewed/Hosted Extensions are domain focused extensions hosted on the Schema.org site. They will have been reviewed and discussed by the broad Schema.org community as to style, compatibility with the core vocabulary, and potential adoption.  An extension is allocated its own part of the schema.org namespace – auto.schema.org & bib.schema.org being the first two examples.

External Extensions are created and hosted separate from Schema.org in their own namespace.  Although related to and building upon [extending] the Schema.org vocabulary these extensions are not part of the vocabulary.  I am editor for an early example of such an external extension BiblioGraph.net that predates the launch of the extension mechanism.  gs1logo Much more recently GS1 (The Global Language of Business) have published their External Extension – the GS1 Web Vocabulary at http://gs1.org/voc/.

An example of how gs1.org extends Schema.org can be seen from inspecting the class gs1:WearableProduct which is a subclass of gs1:Product which in turn is defined as an exact match to schema:Product.  Looking at an example property of gs1:Product, gs1:brand we can see that it is defined as a subproperty of schema:brand.  This demonstrates how Schema.org is foundational to GS1.org.

Choosing Where to Extend

This initially depends on what and how much you are wanting to extend.

If all you are thinking of is adding the odd property to an already existent type, or to add another type to the domain and/or range of a property, or improve the description of a type or property; you probably do not need to create an extension.  Raise an issue, and after some thought and discussion, go for it – create the relevant code and associated Pull Request for the Schema.org Gihub repositiory.

The-ThinkerMore substantial extensions require a bit of thought.

When proposing extension to the Schema.org vocabulary the above-described structure provides the extender/developer with three options.  Extend the core; propose a hosted extension; or develop an external extension.  Potentially a proposal could result in a combination of all three.

For example a proposal could be for a new Type (class) to be added to the core, with few or no additional properties other than those inherited from its super type.  In addition more domain focused properties, or subtypes, for that new type could be proposed as part of a hosted extension, and yet more very domain specific ones only being part of an external extension.

Although not an exact science, there are some basic principles behind such choices.  These principles are based upon the broad context and use of Schema.org across the web, the consuming audience for the data that would be marked up; the domain specific knowledge of those that would do the marking up and reviewing the proposal; and the domain specific need for the proposed terms.

Guiding Questions
A decision as to if a proposed term should be in the core, hosted extension or external extension can be aided by the answers to some basic questions:

  • Public or not public? Will the data that would be marked up using the term be normally shared on the web?  Would you expect to find that information on a publicly accessible web page today?If the answer is not public, there is no point in proposing the term for the core or a hosted extension.  It would be defined in an external extension.
  • General or Specific?  Is the level of information to be marked up, or the thing being described, of interest or relevant to non-domain specific consumers?If the answer is general, the term could be a candidate for a core term. For example Train could be considered as a potential new subtype of Vehicle to describe that mode of transport that is relevant for general travel discovery needs.  Whereas SteamTrain and its associated specific properties about driving wheel configuration etc. would be more appropriate to a railway extension.
  • Popularity? How many sites on the web would potentially be expected to make use of these term(s) How many webmasters would find them useful?If the answer is lots, you probably have a candidate for the core. If it is only a few hundred, especially if they would be all in a particular focus of interest, it would be more likely a candidate for a hosted extension. If it is a small number, it might be more appropriate in an external extension.
  • Detailed or Technical? Is the information, or the detailed nature of proposed properties, too technical for general consumption?If yes, the term should be proposed for a hosted or external extension. In the train example above, the fact that a steam train is being referenced could be contained in the text based description property of a Train type. Whereas the type of steam engine configuration could be a defined value for a property in an external extension.

Evolutionary Steps

When defining and then proposing enhancements to the core of Schema, or for hosted extensions, there is a temptation to take an area of concern, analyse it in detail and then produce a fully complete proposal.   Experience has demonstrated that it is beneficial to gain feedback on the use and adoption of terms before building upon them to extend and add more detailed capability.

Based on that experience the way of extending Schema.org should be by steps that build upon each other in stages.  For example introducing a new subtype with few if any new specific properties.  Initial implementers can use textual description properties to qualify its values in this initial form.  In a later releases more specific properties can be proposed, their need being justified by the take-up, visibility, and use of the subtype on sites across the web.

Closing Summary

Several screen-full’s and a few weeks ago, this started out as a simple post in an attempt to cover off some of the questions I am often asked about how Schema.org is structured, and how it can be made more appropriate for this project or that domain.  Hopefully you find the distillation of my experience and my personal approach, across these three resulting posts on Evolving Schema.org in Practice, enlightening and helpful.  Especially if you are considering proposing a change, enhancement or extension to Schema.org.

My association with Schema.org – applying the vocabulary; making personal proposals; chairing W3C Community groups (Schema Bib Extend, Schema Architypes, The Tourism Structured Web Data Community Group); participating in others (Schema Course extension Community Group, Sport Schema Community Group, Financial Industry Business Ontology Community Group, Schema.org Community Group); being editor of the BiblioGraph.net extension vocabulary; working with various organisations such as OCLC, Google’s Schema.org team, and the Financial Industry Business Ontology (FIBO); and preparing & presenting workshops & keynotes at general data and industry specific events –  has taught me that there is much similarity between, on the surface disparate, industries and sectors when it comes to preparing structured data to be broadly shared and understood.

Often that similarity is hidden behind sector specific views, understanding, and issues in dealing with the open wide web of structured data where they are just another interested group, looking to benefit from shared recognition of common schemas by the major search engine organisations.  But that is all part of the joy and challenge I relish when entering a new domain and meeting new interested and motivated people.

Of course enhancing, extending and evolving the Schema.org vocabulary is only one part of the story.  Actually applying it for benefit to aid the discovery of your organisation, your resources and the web sites that reference them is the main goal for most.

I get the feeling that there maybe another blog post series I should be considering!

Comment   or   Contact us