Evolving Schema.org in Practice Pt2: Working Within the Vocabulary

Thing_rdfa In the previous post in this series Pt1: The Bits and Pieces I stepped through the process of obtaining your own fork of the Schema.org GitHub repository; working on it locally; uploading your version for sharing; and proposing those changes to the Schema.org community in the form of a GitHub Pull Request.

Having covered the working environment; in this post I now intend to describe some of the important files that make up Schema.org and how you can work with them to create or update, examples and term definitions within your local forked version in preparation for proposing them in a Pull Request.

files The File Structure
If you inspect the repository you will see a simple directory structure.  At the top level you will find a few files sporting a .py suffix.  These contain the python application code to run the site you see at http://schema.org.  They load the configuration files, build an in-memory version of the vocabulary that are used to build the html pages containing the definitions of the terms, schema listings, examples displays, etc.  They are joined by a file named app.yaml, which contains the configuration used by the Google App Engine to run that code.

At this level there are some directories containing supporting files: docs & templates contain static content for some pages; tests & scripts are used in the building and testing of the site; data contains the files that define the vocabulary, its extensions, and the examples used to demonstrate its use.

The Data Files
The data directory itself contains various files and directories.  schema.rdfa is the most important file, it contains the core definitions for the majority of the vocabulary.  Although, most of the time, you will see schema.rdfa as the only file with a .rdfa suffix in the data directory, the application will look for and load any .rdfa files it finds here.  This is a very useful feature when working on a local version – you can keep your enhancements together only merging them into the main schema.rdfa file when ready to propose them.

Also in the data directory you will find an examples.txt file and several others ending with –examples.txt.  These contain the examples used on the term pages, the application loads all of them.

Amongst the directories in data, there are a couple of important ones.  releases contains snapshots of versions of the vocabulary from version 2.0 onwards.  datafilesThe directory named ext contains the files that define the vocabulary extensions and examples that relate to them.  Currently you will find auto and bib directories within ext, corresponding to the extensions currently supported.  The format within these directories follows the basic pattern of the data directory – one or more .rdfa files containing the term definitions and –examples.txt files containing relevant examples.

Getting to grips with the RDFa

Enough preparation let’s get stuck into some vocabulary!

Take your favourite text/code editing application and open up schema.rdfa. You will notice two things – it is large [well over 12,500 lines!], and it is in the format of a html file.  Schema_html This second attribute makes it easy for non-technical viewing – you can open it with a browser.

Once you get past a bit of CSS formatting information and a brief introduction text, you arrive [about 35 lines down] at the first couple of definitions – for Thing and CreativeWork.

The Anatomy of a Type Definition
Standard RDFa (RDF in attributes) html formatting is used to define each term.  A vocabulary Type is defined as a RDFa marked up <div> element with its attributes contained in marked up <span> elements.

The Thing Type definition:

The attributes of the <div> element indicate that this is the definition of a Type (typeof=”rdfs:Class”) and its canonical identifier (resource=”http://schema.org/Thing”).  The <span> elements filling in the details – it has a label (rdfs:label) of ‘Thing‘ and a descriptive comment (rdfs:comment) of ‘The most generic type of item‘.  There is one formatting addition to the <span> containing the label.  The class=”h” is there to make the labels stand out when viewing in a browser – it has no direct relevance to the structure of the vocabulary.

The CreativeWork Type definition

Inspecting the CreativeWork definition reveals a few other attributes of a Type defined in <span> elements.  The rdfs:subClassOf property, with the associated href on the <a> element, indicates that http://schema.org/CreativeWork is a sub-type of http://schema.org/Thing.

Acknowledge Finally there is the dc:source property and its associated href value.  This has no structural impact on the vocabulary, its purpose is to acknowledge and reference the source of the inspiration for the term.  It is this reference that results in the display of a paragraph under the Acknowledgements section of a term page.

Defining Properties
The properties that can be used with a Type are defined in a very similar way to the Types themselves.

The name Property definition:

The attributes of the <div> element indicate that this is the definition of a Property (typeof=”rdf:Property”) and its canonical identifier (resource=”http://schema.org/name”).  As with Types the <span> elements fill in the details.

name Properties have two specific <span> elements to define the domain and range of a property.  If these concepts are new to you, the concepts are basically simple.  The Type(s) defined as being in the domain of a property are are those for which the property is a valid attribute.  The Type(s) defined as being in the range of a property, are those that expected values for that property.  So inspecting the above name example we can see that name is a valid property of the Thing Type with an expected value type of Text.  Also specific to property definitions is rdfs:subPropertyOf which defies that one property is a sub-property another.  For html/RDFa format reasons this is defined using a link entity thus: <link property=”rdfs:subPropertyOf” href=”http://schema.org/workFeatured” />.

Those used to defining other RDF vocabularies may question the use of  http://schema.org/domainIncludes and http://schema.org/rangeIncludes to define these relationships. This is a pragmatic approach to producing a flexible data model for the web.  For a more in-depth explanation I refer you to the Schema.org Data Model documentation.

Not an exhaustive tutorial in editing the defining RDFa but hopefully enough to get you going!

 

Making Examplesexamples

One of the most powerful features of the Schema.org documentation is the Examples section on most of the term pages.  These provide mark up examples for most of the terms in the vocabulary, that can be used and built upon by those adding Schema.org data to their web pages.  These examples represent how the html of a page or page section may be marked up.  To set context, the examples are provided in several serialisations – basic html, html plus Microdata, html plus RDFa, and JSON-LD.  As the objective is to aid the understanding of how Schema.org may be used, it is usual to provide simple basic html formatting in the examples.

Examples in File
As described earlier, the source for examples are held in files with a –examples.txt suffix, stored in the data directory or in individual extension directories.

One or more examples per file are defined in a very simplistic format.

An example begins in the file with a line that starts with TYPES:, such as this:

This example has a unique identifier prefixed with a # character, there should be only one of these per example.  These identifiers are intended for future feedback mechanisms and as such are not particularly controlled.  I recommend you crate your own when creating your examples.  Next comes a comma separated list of term names.  Adding a term to this list will result in the example appearing on the page for that term.  This is true for both Types and Properties.

Next comes four sections each preceded by a line containing a single label in the following order: PRE-MARKUP:, MICRODATA:, RDFA:, JSON:.  Each section ends when the next label line, or the end of the file is reached.  The contents of each section of the example is then inserted into the appropriate tabbed area on the term page.  The process that does this is not a sophisticated one, there are no error or syntax checking involved – if you want to insert the text of the Gettysburg Address as your RDFa example, it will let you do it.

I am not going to provide tutorials for html, Microdata, RDFa, or JSON-LD here there are a few of those about.  I will however recommend a tool I use to convert between these formats when creating examples.  RDF Translatorrdftranslator is a simple online tool that will validate and translate between RDFa, Microdata, RDF/XML, N3, N-Triples, and JSON-LD.  A suggestion, to make your examples as informative possible – when converting between formats, especially when converting to JSON-LD, most conversion tools reorder he statements. It is worth investing some time in ensuring that the mark up order in your example is consistent for all serialisations.

Hopefully this post will clear away some of mystery of how Schema.org is structured and managed. If you have proposals in mind to enhance and extend the vocabulary or examples, have a go, see if thy make sense in a version on your own system, suggest them to the community on Github.

In my next post I will look more at extensions, Hosted and External, and how you work with those, including some hints on choosing where to propose changes – in the core vocabulary, in a hosted or an external extension.

Comment below or Contact us