Library Metadata Evolution: The Final Mile

I am honoured to have been invited to speak at the CILIP Conference 2019, in Manchester UK, July 3rd. My session, in the RDA, ISNIs and Linked Data track, shares the title of this post: Library Metadata Evolution: The Final Mile?

Why this title, and why the question mark?

I have been rattling around the library systems industry/sector for nigh on thirty years, with metadata always being somewhere near the core of my thoughts. From that time as a programmer wrestling with the new to me ‘so-called’ standard MARC; reacting to meet the challenges of the burgeoning web; through the elephantine gestation of RDA; several experiments with linked library data; and the false starts of BIBFRAME, it has consistently never scaled to the real world beyond the library walls and firewall.

When Schema.org arrived on the scene I thought we might have arrived at the point where library metadata could finally blossom; adding value outside of library systems to help library curated resources become first class citizens, and hence results, in the global web we all inhabit. But as yet it has not happened.

The why not yet, is something that has been concerning me of late. The answer I believe is that we do not yet have a simple, mostly automatic, process to take data from MARC, process it to identify entities (Work, Instance, Person, Organisation, etc.) and deliver it as Linked Data (BIBFRAME), supplemented with the open vocabulary used on the web (Schema.org).

Many of the elements of this process are, or nearly are, in place. The Library of Congress site provides conversion scripts from MARCXML to BIBFRAME, the Bibframe2Schema.org group are starting to develop conversion processes to take that BIBFRAME output and add Schema.org.

So what’s missing? A key missing piece is entity reconciliation.

The BIBFRAME conversion scripts identity the Work for the record instance being processed. However, they do not recognise Works they have previously processed. What is needed is a repeatable reliable, mostly automatic process for reconciling the many unnecessarily created duplicate Work descriptions in the current processes. Reconciliation is also important for other entity types, but with the aid of VIAF, ISNI, LC’s name and other identifier authorities, most of the groundwork for those is well established.

There are a few other steps required to achieve the goal of being able to deliver search engine targeted structured library metadata to the web at scale, such as crowbarring the output into out ageing user interfaces, but none I believe more important than giving them data about reconciled entities.

Although my conclusions here may seem a little pessimistic, I am convinced we are on that last mile which will take us to a place and time where library curated metadata is a first class component of search engine knowledge graphs and is likely to lead a user to a library enabled fulfilment as to a commercial one.

[schemaapprating]

Structured Data / Schema.org Site Audit Service Launched