OCLC Preview 194 Million Open Bibliographic Work Descriptions

WorldCat_Logo_V_Color I have just been sharing a platform, at the OCLC EMEA Regional Council Meeting in Cape Town South Africa, with my colleague Ted Fons.  A great setting for a great couple of days of the OCLC EMEA membership and others sharing thoughts, practices, collaborative ideas and innovations.

Ted and I presented our continuing insight into The Power of Shared Data, and the evolving data strategy for the bibliographic data behind WorldCat. If you want to see a previous view of these themes you can check out some recordings we made late last year on YouTube, from Ted – The Power of Shared Data – and me – What the Web Wants.

Today, demonstrating on-going progress towards implementing the strategy, I had the pleasure to preview two upcoming significant announcements on the WorldCat data front:

  1. The release of 194 Million Open Linked Data Bibliographic Work descriptions
  2. The WorldCat Linked Data Explorer interface

ZenWorldCat Works

A Work is a high-level description of a resource, containing information such as author, name, descriptions, subjects etc., common to all editions of the work.  The description format is based upon some of the properties defined by the CreativeWork type from the Schema.org vocabulary.  In the case of a WorldCat Work description, it also contains [Linked Data] links to individual, oclc numbered, editions already shared in WorldCat.   Let’s take a look at one – try this: http://worldcat.org/entity/work/id/12477503

You will see, displayed in the new WorldCat Linked Data Explorer, a html view of the data describing ‘Zen and the art of motorcycle maintenance’. Click on the ‘Open All’ button to view everything.  Anyone used to viewing bibliographic data will see that this is a very different view of things. It is mostly URIs, the only visible strings being the name or description elements.  This is not designed as an end-user interface, it is designed as a data exploration tool.  viewsThis is highlighted by the links at the top to alternative RDF serialisations of the data – Turtle, N-Triple, JSON-LD, RDF/XML.

The vocabulary used to describe the data is based upon Schema.org, and enhancements to it recommended and proposed by the Schema Bib Extend W3C Community Group, which I have the pleasure to chair.

Why is this a preview? Can I usefully use the data now? Are a couple of obvious questions for you to ask at this time.

This is the first production release of WorldCat infrastructure delivering linked data.  The first step in what will be an evolutionary, and revolutionary journey, to provide interconnected linked data views of the rich entities (works, people, organisations, concepts, places, events) captured in the vast shared collection of bibliographic records that makes up WorldCat.  Mining those, 311+ million, records is not a simple task, even to just identify works. It takes time, and a significant amount of [Big Data] computing resources.  One of the key steps in this process is to identify where they exist connections between works and authoritative data hubs, such as VIAF, FAST, LCSH, etc.  In this preview release, it is some of those connections that are not yet in place.

What you see in their place at the moment is a link to, what can be described as, a local authority.  These are exemplified by what the data geeks call a hash-URI as its identifier. http://experiment.worldcat.org/entity/work/data/12477503#Person/pirsig_robert for example is such an identifier, constructed from the work URI and the person name.  Over the next few weeks, where the information is available, you would expect to see this link replaced by a connection to VIAF, such as this: http://viaf.org/viaf/78757182.

So, can I use the data? – Yes, the data is live, and most importantly the work URIs are persistent. It is also available under an open data license (ODC-BY).

How do I get a work id for my resources? – Today, there is one way.  If you use the OCLC xISBN, xOCLCNum web services you will find as part of the data returned a work id (eg. owi=”owi12477503”). By striping off the ‘owi’ you can easily create the relevant work URI: http://worldcat.org/entity/work/id/12477503

In a very few weeks, once the next update to the WorldCat linked data has been processed, you will find that links to works will be embedded in the already published linked data.  For example you will find the following in the data for OCLC number 53474380:

schema:exampleOfWork http://worldcat.org/entity/work/id/12477503

What is next on the agenda? As described, within a few weeks, we expect to enhance the linking within the descriptions and provide links from the oclc numbered manifestations.  From then on, both WorldCat and others will start to use WorldCat Work URIs, and their descriptions, as a core stable foundations to build out a web of relationships between entities in the library domain.  It is that web of data that will stimulate the sharing of data and innovation in the design of applications and interfaces consuming the data over coming months and years.

As I said on the program today, we are looking for feedback on these releases.

We as a community are embarking on a new journey with shared, linked data at its heart. Its success will be based upon how that data is exposed, used, and the intrinsic quality of that data.  Experience shows that a new view of data often exposes previously unseen issues, it is just that sort of feedback we are looking for.  So any feedback on any aspect of this will be more than welcome.

I am excitedly looking forward to being able to comment further as this journey progresses.

Update:  I have posted answers to some interesting questions raised by this release.

Comment below or Contact us