
Semantic Tech & Business Conference
San Francisco 2-5 June, 2013
Register
Typical! Since joining OCLC as Technology Evangelist, I have been preparing myself to be one of the first to blog about the release of linked data describing the hundreds of millions of bibliographic items in WorldCat.org. So where am I when the press release hits the net? 35,000 feet above the North Atlantic heading for LAX, that’s where – life just isn’t fair.
By the time I am checked in to my Anahiem hotel, ready for the ALA Conference, this will be old news. Nevertheless it is significant news, significant in many ways.
OCLC have been at the leading edge of publishing bibliographic resources as linked data for several years. At dewey.info they have been publishing the top levels of the Dewey classifications as linked data since 2009. As announced yesterday, this has now been increased to encompass 32,000 terms, such as this one for the transits of Venus. Also around for a few years is VIAF (the Virtual International Authorities File) where you will find URIs published for authors, such as this well known chap. These two were more recently joined by FAST (Faceted Application of Subject Terminology), providing usefully applicable identifiers for Library of Congress Subject Headings and combinations thereof.
Despite this leading position in the sphere of linked bibliographic data, OCLC has attracted some criticism over the years for not biting the bullet and applying it to all the records in WorldCat.org as well. As today’s announcement now demonstrates, they have taken their linked data enthusiasm to the heart of their rich, publicly available, bibliographic resources – publishing linked data descriptions for the hundreds of millions of items in WorldCat.
Let me dissect the announcement a bit….
First significant bit of news – WorldCat.org is now publishing linked data for hundreds of millions of bibliographic items – that’s a heck of a lot of linked data by anyone’s measure. By far the largest linked bibliographic resource on the web. Also it is linked data describing things, that for decades librarians in tens of thousands of libraries all over the globe have been carefully cataloguing so that the rest of us can find out about them. Just the sort of authoritative resources that will help stitch the emerging web of data together.
Second significant bit of news – the core vocabulary used to describe these bibliographic assets comes from schema.org. Schema.org is the initiative backed by Google, Yahoo!, Microsoft, and Yandex, to provide a generic high-level vocabulary/ontology to help mark up structured data in web pages so that those organisations can recognise the things being described and improve the services they can offer around them. A couple of examples being Rich Snippet results and inclusion in the Google Knowledge Graph.
As I reported a couple of weeks back, from the Semantic Tech & Business Conference, some 7-10% of indexed web pages already contain schema.org, microdata or RDFa, markup. It may at first seem odd for a library organisation to use a generic web vocabulary to mark up it’s data – but just think who the consumers of this data are, and what vocabularies are they most likely to recognise? Just for starters, embedding schema.org data in WorldCat.org pages immediately makes them understandable by the search engines, vastly increasing the findability of these items.
Third significant bit of news – the linked data is published both in human readable form and in machine readable RDFa on the standard WorldCat.org detail pages. You don’t need to go to a special version or interface to get at it, it is part of the normal interface. As you can see, from the screenshot of a WordCat.org item above, there is now a Linked Data section near the bottom of the page. Click and open up that section to see the linked data in human readable form.
You will see the structured data that the search engines and other systems will get from parsing the RDFa encoded data, within the html that creates the page in your browser. Not very pretty to human eyes I know, but just the kind of structured data that systems love.
Fourth significant bit of news – OCLC are proposing to cooperate with the library and wider web communities to extend Schema.org making it even more capable for describing library resources. With the help of the W3C, Schema.org is working with several industry sectors to extend the vocabulary to be more capable in their domains – news, and e-commerce being a couple of already accepted examples. OCLC is playing it’s part in doing this for the library sector.
Take a closer look at the markup on WorldCat.org and you will see attributes from a library vocabulary. Attributes such as library:holdingsCount and library:oclcnum. This library vocabulary is OCLC’s conversation starter with which we want to kick off discussions with interested parties, from the library and other sectors, about proposing a basic extension to schema.org for library data. What better way of testing out such a vocabulary – markup several million records with it, publish them and see what the world makes of them.
Fifth significant bit of news – the WorldCat.org linked data is published under an Open Data Commons (ODC-BY) license, so it will be openly usable by many for many purposes.
Sixth significant bit of news – This release is an experimental release. This is the start, not the end, of a process. We know we have not got this right yet. There are more steps to take around how we publish this data in ways in addition to RDFa markup embedded in page html – not everyone can, or will want to, parse pages to get the data. There are obvious areas for discussion around the use of schema.org and the proposed library extension to it. There are areas for discussion about the application of the ODC-BY license and attribution requirements it asks for. Over the coming months OCLC wants to constructively engage with all that are interested in this process. It is only with the help of the library and wider web communities that we can get it right. In that way we can assure that WorldCat linked data can be beneficial for the OCLC membership, libraries in general, and a great resource on the emerging web of data.
For more information about this release, check out the background to linked data at OCLC, join the conversation on the OCLC Developer Network, or email data@oclc.org.
As you can probably tell I am fairly excited about this announcement. This, and future stuff like it, are behind some of my reasons for joining OCLC. I can’t wait to see how this evolves and develops over the coming months. I am also looking forward to engaging in the discussions it triggers.
[...] OCLC WorldCat Linked Data Release – Significant In Many Ways [...]
Fascinating stuff. I’m keynoting an OCLC tomorrow on platforms and this is nothing short of amazing. To me, the key part is this:
This release is an experimental release. This is the start, not the
end, of a process. We know we have not got this right yet. There are
more steps to take around how we publish this data in ways in addition
to RDFa markup embedded in page html – not everyone can, or will want
to, parse pages to get the data
Yes! It’s not about getting it right the first time. Platforms are a mind-set as much as anything.
[...] Linked Data – why it’s important Excellent rundown on why the WorldCat support for Linked Data is a big [...]
[...] pages now include schema.org markup. Richard Wallis has provided further insight into this news in a new article: “OCLC have been at the leading edge of publishing bibliographic resources as linked data for [...]
[...] Wallis, Technology Evangelist for OCLC, blogs on the importance of this development for WorldCat “OCLC WorldCat Linked Data Release – Significant In Many Ways”. Includes a link to an example of a linked data record in [...]
A really impressive implementation of RDFA and Schema.org. Great to see linked data being published with a non library audience in mind.
I wonder how this involvement will be dovetailing with the LOC Marc translation work?
[...] 2012-6-25] Data Liberate Blog: OCLC WorldCat Linked Data Release – Significant In Many Ways / By Richard Wallis on June 21, 2012 Richard [...]
[...] past week. My colleagues, Roy Tennant and Richard Wallis, both have good blog posts (Roy’s) (Richard’s) explaining the what and the why of making WorldCat data available in a linked data format. The [...]
Richard, this is very impressive. However are the terms of use for this data as indicated at the bottom of the page, basically non-commercial, no robots, no keeping saved copies, no redistribution, no using the data to improve my home library catalogue, etc. etc.?
Hi William,
No – as I say in the post “the WorldCat.org linked data is published under an Open Data Commons (ODC-BY) license, so it will be openly usable by many for many purposes.”
If you follow the ‘More info about Linked Data’ link in the Linked Data section of a WorldCat detail page, you will find more information there.
~Richard.
Richard,
That is indeed wonderful news! I just skimmed the post and went straight to the item page for the Harry Potter book. The license could perhaps be made more obvious there. I didn’t think to click on “More info about LD” because I guessed that would just explain to me what LD was – the only thing that I easily found was the standard links at the bottom of the page to the generic draconian terms and conditions.
Cheers,
-w
William,
In a complete product you would be correct. As we say this is the first steps in an experiment that will evolve over time. Placement and embedding of licensing information is something that will evolve too.
~Richard.
[...] Data, schema.org Tagged: Libraries, Licensing, Linked Data, OCLC, Open Data, schema.org, WorldCat You may remember my frustration a couple of months ago, at being in the air when OCLC announced the addition of Schema.org [...]
[...] and unpacked in this article — OCLC WorldCat Linked Data Release – Significant In Many Ways, Richard Wallis, Data Liberate (JUn 21) This entry was posted in Scholarly by Gwen. Bookmark [...]
[...] web. This is what OCLC did in the initial exercise to publish the 270+ million resources in WorldCat.org as Linked Data.At the same time, I believe that summer 2012 was a bit of a watershed for Linked Data in the [...]