Evolving Schema.org in Practice Pt2: Working Within the Vocabulary

Thing_rdfa In the previous post in this series Pt1: The Bits and Pieces I stepped through the process of obtaining your own fork of the Schema.org GitHub repository; working on it locally; uploading your version for sharing; and proposing those changes to the Schema.org community in the form of a GitHub Pull Request.

Having covered the working environment; in this post I now intend to describe some of the important files that make up Schema.org and how you can work with them to create or update, examples and term definitions within your local forked version in preparation for proposing them in a Pull Request.

Update Note:

This post was updated in June 2020 to reflect changes in the processes required to work with the Schema.org sources that have occurred over the proceeding months and years.

files The File Structure
If you inspect the repository you will see a simple directory structure.  At the top level you will find a few files sporting a .py suffix.  These contain the python application code to run the site you see at http://schema.org.  They load the configuration files, build an in-memory version of the vocabulary that are used to build the html pages containing the definitions of the terms, schema listings, examples displays, etc.  They are joined by a file named app.yaml, which contains the configuration used by the Google App Engine to run that code.

At this level there are some directories containing supporting files: docs & templates contain static content for some pages; tests & scripts are used in the building and testing of the site; data contains the files that define the vocabulary, its extensions, and the examples used to demonstrate its use.

The Data Files

Important Note: From version 8.0 onwards, Schema.org has moved from using the rdfa format to define its vocabulary structure to using Turtle (ttl).  This post has been updated to reflect that change

The data directory itself contains various files and directories.  schema.ttl is the most important file, it contains the core definitions for the majority of the vocabulary.  Although, most of the time, you will see schema.ttl as the only file with a .ttl suffix in the data directory, the application will look for and load any .ttl files it finds here.  This is a very useful feature when working on a local version – you can keep your enhancements together only merging them into the main schema.ttl file when ready to propose them.

Also in the data directory you will find an examples.txt file and several others ending with –examples.txt.  These contain the examples used on the term pages, the application loads all of them.

Amongst the directories in data, there are a couple of important ones.  releases contains snapshots of versions of the vocabulary from version 2.0 onwards.  datafilesThe directory named ext contains the files that define the vocabulary extensions and examples that relate to them.  Currently you will find auto and bib directories within ext, corresponding to the extensions currently supported.  The format within these directories follows the basic pattern of the data directory – one or more .ttl files containing the term definitions and –examples.txt files containing relevant examples.

Getting to grips with the Turtle

Enough preparation let’s get stuck into some vocabulary!

Take your favourite text/code editing application and open up schema.ttl. You will notice two things – it is large [well over 10,000 lines!], and it is in a structured text format (Turtle). For example check out the definition for Thing (search for ‘:Thing a rdfs:Class’) or  CreativeWork (Search for ‘:CreativeWork a rdfs:Class’).

The Anatomy of a Type Definition
Standard Turtle formatting is used to define each term.  A vocabulary Type is defined as a entry in that starts with the Type name (proceeded by a ‘:’ signifying that it is being defined in the default vocabulary for the file – in this case http://schema.org/) followed by ‘a rdfs:Class ;’.  This line is followed  buy further lines defining attributes of the Type, each ending with ‘;’, except the final line that ends with ‘.’

The Thing Type definition:

:Thing a rdfs:Class ;
   rdfs:label "Thing" ;
   rdfs:comment "The most generic type of item." .
This is the simplest form of Type definition, only including a label definition and a comment.
The CreativeWork Type definition

:CreativeWork a rdfs:Class ;
   rdfs:label "CreativeWork" ;
   :source <http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_rNews> ;
   rdfs:comment "The most generic kind of creative work, including books, movies, photographs, software programs, etc." ;
   rdfs:subClassOf :Thing .
   CreativeWork, The most generic kind of creative work, including books, movies, photographs, software programs, etc.    Note the attributes defining that it is Subclass of: Thing  and has a credited source of its definition  Source:  rNews

Defining Properties
The properties that can be used with a Type are defined in a very similar way to the Types themselves.

The name Property definition:

:name a rdf:Property ;
   rdfs:label "name" ;
   :domainIncludes :Thing ;
   :rangeIncludes :Text ;
   rdfs:comment "The name of the item." ;
   rdfs:subPropertyOf rdfs:label ;
   owl:equivalentProperty dc:title .
The attributes indicate that this is the definition of a Property (a rdf:Property”). As with Types the following elements fill in the details.

name Properties have two specific  elements to define the domain and range of a property.  If these concepts are new to you, the concepts are basically simple.  The Type(s) defined as being in the domain of a property are are those for which the property is a valid attribute.  The Type(s) defined as being in the range of a property, are those that expected values for that property.  So inspecting the above name example we can see that name is a valid property of the Thing Type with an expected value type of Text.  Also specific to property definitions is rdfs:subPropertyOf which defies that one property is a sub-property another..

Those used to defining other RDF vocabularies may question the use of  http://schema.org/domainIncludes and http://schema.org/rangeIncludes to define these relationships. This is a pragmatic approach to producing a flexible data model for the web.  For a more in-depth explanation I refer you to the Schema.org Data Model documentation.

Not an exhaustive tutorial in editing the defining RDFa but hopefully enough to get you going!

Making Examplesexamples

One of the most powerful features of the Schema.org documentation is the Examples section on most of the term pages.  These provide mark up examples for most of the terms in the vocabulary, that can be used and built upon by those adding Schema.org data to their web pages.  These examples represent how the html of a page or page section may be marked up.  To set context, the examples are provided in several serialisations – basic html, html plus Microdata, html plus RDFa, and JSON-LD.  As the objective is to aid the understanding of how Schema.org may be used, it is usual to provide simple basic html formatting in the examples.

Examples in File
As described earlier, the source for examples are held in files with a –examples.txt suffix, stored in the data directory or in individual extension directories.

One or more examples per file are defined in a very simplistic format.

An example begins in the file with a line that starts with TYPES:, such as this:

TYPES: #eg2  Place,LocalBusiness, address, streetAddress

This example has a unique identifier prefixed with a # character, there should be only one of these per example.  These identifiers are intended for future feedback mechanisms and as such are not particularly controlled.  I recommend you crate your own when creating your examples.  Next comes a comma separated list of term names.  Adding a term to this list will result in the example appearing on the page for that term.  This is true for both Types and Properties.

Next comes four sections each preceded by a line containing a single label in the following order: PRE-MARKUP:, MICRODATA:, RDFA:, JSON:.  Each section ends when the next label line, or the end of the file is reached.  The contents of each section of the example is then inserted into the appropriate tabbed area on the term page.  The process that does this is not a sophisticated one, there are no error or syntax checking involved – if you want to insert the text of the Gettysburg Address as your RDFa example, it will let you do it.

I am not going to provide tutorials for html, Microdata, RDFa, or JSON-LD here there are a few of those about.  I will however recommend a tool I use to convert between these formats when creating examples.  RDF Translatorrdftranslator is a simple online tool that will validate and translate between RDFa, Microdata, RDF/XML, N3, N-Triples, and JSON-LD.  A suggestion, to make your examples as informative possible – when converting between formats, especially when converting to JSON-LD, most conversion tools reorder he statements. It is worth investing some time in ensuring that the mark up order in your example is consistent for all serialisations.

Hopefully this post will clear away some of mystery of how Schema.org is structured and managed. If you have proposals in mind to enhance and extend the vocabulary or examples, have a go, see if thy make sense in a version on your own system, suggest them to the community on Github.

In my next post I will look more at extensions, Hosted and External, and how you work with those, including some hints on choosing where to propose changes – in the core vocabulary, in a hosted or an external extension.

11 Replies to “Evolving Schema.org in Practice Pt2: Working Within the Vocabulary”

Leave a Reply to infopeep Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.