By Richard Wallis on June 3, 2013
- 3 Comments
, Linked Data
Tagged: Linked Data
I am pleased to share with you a small but significant step on the Linked Data journey for WorldCat and the exposure of data from OCLC.
Content-negotiation has been implemented for the publication of Linked Data for WorldCat resources.
For those immersed in the publication and consumption of Linked Data, there is little more to say. However I suspect there are a significant number of folks reading this who are wondering what the heck I am going on about. It is a little bit techie but I will try to keep it as simple as possible.
Back last year, a linked data representation of each (of the 290+ million) WorldCat resources was embedded in it’s web page on the WorldCat site. For full details check out that announcement but in summary:
- All resource pages include Linked Data
- Human visible under a Linked Data tab at the bottom of the page
- Embedded as RDFa within the page html
- Described using the Schema.org vocabulary
- Released under an ODC-BY open data license
That is all still valid – so what’s new from now?
That same data is now available in several machine readable RDF serialisations. RDF is RDF, but dependant on your use it is easier to consume as RDFa, or XML, or JSON, or Turtle, or as triples.
In many Linked Data presentations, including some of mine, you will hear the line “As I clicked on the link a web browser we are seeing a html representation. However if I was a machine I would be getting XML or another format back.” This is the mechanism in the http protocol that makes that happen.
Let me take you through some simple steps to make this visible for those that are interested.
Starting with a resource in WorldCat: http://www.worldcat.org/oclc/41266045. Clicking that link will take you to the page for Harry Potter and the prisoner of Azkaban. As we did not indicate otherwise, the content-negotiation defaulted to returning the html web page.
To specify that we want RDF/XML we would specify http://www.worldcat.org/oclc/41266045.rdf (dependant on your browser this may not display anything, but allow you to download the result to view in your favourite editor)
For JSON specify http://www.worldcat.org/oclc/41266045.jsonld
For turtle specify http://www.worldcat.org/oclc/41266045.ttl
For triples specify http://www.worldcat.org/oclc/41266045.nt
This allows you to manually specify the serialisation format you require. You can also do it from within a program by specifying, to the http protocol, the format that you would accept from accessing the URI. This means that you do not have to write code to add the relevant suffix to each URI that you access. You can replicate the effect by using curl, a command line http client tool:
curl -L -H “Accept: application/rdf+xml” http://www.worldcat.org/oclc/41266045
curl -L -H “Accept: application/ld+json” http://www.worldcat.org/oclc/41266045
curl -L -H “Accept: text/turtle” http://www.worldcat.org/oclc/41266045
curl -L -H “Accept: text/plain” http://www.worldcat.org/oclc/41266045
So, how can I use it? However you like.
If you embed links to WorldCat resources in your linked data, the standard tools used to navigate around your data should now be able to automatically follow those links into and around WorldCat data. If you have the URI for a WorldCat resource, which you can create by prefixing an oclc number with ‘http://www.worldcat.org/oclc/’, you can use it in a program, browser plug-in, smartphone/facebook app to pull data back, in a format that you prefer, to work with or display.
Go have a play, I would love to hear how people use this.