Create Data Not Records

One phrase in particular leapt out at me when reading Karen Coyle’s Bibliographic Framework: RDF and Linked Data post a few days ago.

My message here is that we need to be creating data, not records, and that we need to create the data first, then build records with it for those applications where records are needed.

Karen was addressing the result of looking at the move, of bibliographic metadata in to Linked Data and by definition RDF, from the “I’ve got all these records that I think I want to preserve” end of the telescope. She points out there are many translation efforts focused upon representing library record formats such as ISBD, FRBR, RDA, MODS and Marc in RDF, none of which have been endorsed by a standards body and all mutuality incompatible information silos. With an enlightening explanation of how there are more ways, to describe the Place of Publication in library land, than you can shake a stick at, she identifies that these are conditioned by context. A context that travels with the translations from a library record format in to an RDF representation thereof.

This is the antithesis of the linked data concept, where data sets from diverse sources share metadata elements. It is this re-use of elements that creates the “link” in linked data. To achieve this, metadata elements need to be unconstrained by a particular context.

It is interesting to note that this first use of the words Linked Data in the post, and a significant way in to her text. I believe this is symptomatic of the audience Karen is trying to enlighten and convince. For decades they have been focused on tightly controlled record formats and rules. So much so that the business model for a new format appears to be based upon marketing the documentation required to understand and apply it. It is hardly surprising therefore, that they hook on to RDF as a ‘standard’ and then try to represent and enforce their context upon it. Yes they end up with valid RDF (that is not difficult as you can represent almost anything in RDF), but is it useful Linked Data – more often than not – no.

I find it helps, to visualise how information should be captured as Linked Data, by approaching it from the point of view of a non-domain expert. With that hat on, the place of publication is easy – eg. London = the place where the publisher that published the item was located. A library domain expert would explain how under certain circumstances, such as a printing error or a special context, it is not that simple. The non-domain expert would also assert in my example that the London in place of publication, is the same London as the subject of the book, the same London referenced in the text, the same London that the author resided in, the same London in the title, the same London in which you can find a library to loan a copy from, and the same London that has http://dbpedia.org/resource/London as it’s DbPedia URI. I leave you to imagine the response from your favourite librarian…

So how do we square this circle – by, as Karen says, creating data, not records. We can create some data describing London as a place with it’s geographic location, and strings of characters in different languages that are accepted as it’s name. We could then create some more about a book, even more about a publication event (date, a link to a publisher, and a link to the place we just described). If later on we, or someone else, want to point to [link] our London piece of data as describing the place the book is about that would be equally valid, and not detrimental to the relationships we had created between it and any other data entities we described earlier.

Although we can not un-invent them, I see these translations to RDF as unhelpful. What is needed are entity extraction and linking processes to help create the entity descriptions we need from the information captured in the individual record silos that we have built up over the decades.

In her comments, Karen points to the data model(pdf) used by the British Library when publishing the British National Bibliography as Linked Data. A project, with which I was associated, that took the approach I describe, and is well worth a look at. When you do, remember that it is a first pass at modelling their data which will be built upon later by the BL and others. If you do not see your favourite, FRBR or other, relationships represented, why not comment on how they could be added to the model alongside the relationships already represented. That is one of the major benefits of using RDF and Linked Data, you can mix different ways of expressing relationships in a single model.

Structured Data / Schema.org Site Audit Service Launched

Leave a Reply Cancel reply