Library of Congress To Boldly Voyage To Linked Data Worlds






The Library of Congress made an announcement earlier this week that has left some usually vocal library pundits speechless. MARC is Dead! – RDA made irrelevant! – cries that can be heard rattling around the bibliographic blogo-twittersphere.

The Library of Congress made an announcement earlier this week that has left some usually vocal library pundits speechless.

Roy Tennant (rtennant) on Twitter

 

 

loc_logo_detail

MARC is Dead!RDA made irrelevant! – cries that can be heard rattling around the bibliographic blogo-twittersphere.   My opinion is that this is an inevitable move based upon serious consideration, and has been building on several initiatives that have been brewing for many months.

Bold though – very bold.  I am sure that there are many in the library community, who have invested much of their careers in MARC and its slightly more hip cousin RDA, who are now suffering from vertigo as they feel the floor being pulled from beneath their feet.

The Working Group of the Future of Bibliographic Control, as it examined technology for the future, wrote that the Library community’s data carrier, MARC, is “based on forty-year-old techniques for data management and is out of step with programming styles of today.”

Many of the libraries taking part in the test [of RDA] indicated that they had little confidence RDA changes would yield significant benefits…

 

And on a more positive note:

The Library of Congress (LC) and its MARC partners are interested in a deliberate change that allows the community to move into the future with a more robust, open, and extensible carrier for our rich bibliographic data….
….The new bibliographic framework project will be focused on the Web environment, Linked Data principles and mechanisms, and the Resource Description Framework (RDF) as a basic data model.

There is still a bit of confusion there between a data carrier and a framework for describing resources.  Linked Data is about linking descriptions of things, not necessarily transporting silos of data from place to place.  But maybe I quibble a little too much at this early stage.

So now what:

The Library of Congress will be developing a grant application over the next few months to support this initiative.  The two-year grant will provide funding for the Library of Congress to organize consultative groups (national and international) and to support development and prototyping activities.  Some of the supported activities will be those described above:  developing models and scenarios for interaction within the information community, assembling and reviewing ontologies currently used or under development, developing domain ontologies for the description of resources and related data in scope, organizing prototypes and reference implementations.

I know that this is the way that LoC and the library community do things, but do I hope that this doesn’t mean that they will disappear into an insular huddle for a couple of years to re-emerge with something that is almost right yet missing some of the evolution that is going on around them over that period.

As per other recent announcements, such as the vote to openly share European Libraries’ data, the report from the W3C’s Library Linked Data Incubator Group, and now the report from the Stanford Linked Data Workshop.  I welcome these developments. However I warn those involved that these are great opportunities [to enable the valuable resources catalogued and curated by libraries over decades to become foundational assets of the future web] that can be easily squandered by not applying the open thinking that characterise successes in the web of data.

British Library Data Model One very relevant example of the success of applying open thinking and approach to the bibliographic word using Linked Data is the open publishing of the British National Bibliography (BnB).  Readers of this blog will know that we at Talis have worked closely with the team at the BL in their ground breaking work.   The data model they produced is an example of one of those things that may induce that feeling of vertigo that I mentioned.  It doesn’t look much like a MARC record!  I can assure the sceptical that although it may be very different to what you are used to, it is easy to get your head around.  (Drop us a line if you want some guidance).

As Talis host the BnB Linked Data for the BL, I can testify to the success of this work – only launched in mid July.  It’s use is growing rapidly, receiving just short of 2 million hits in the last month alone.

With the British Library, along with the National Libraries of Canada and Germany, being quoted as partners with the LoC in this initiative, plus their work being referenced as an exemplar in the other reports I mention, I hold out a great hope that things are headed in the right direction.

As comments to some of my previous posts attest, there is concern from some in the community of domain experts, that this RDF stuff is too simple and light-weight and will not enable them capture the rich detail that they need.  They are missing a couple of points.  Firstly, it is this simplicity that will help non-domain experts to understand, reference and link to their rich resources.  Secondly, RDF is more than capable of describing the rich detail that they require – using several emerging ontologies including the RDA ontology, FRBR, etc.  Finally and most importantly, it is not a binary choice between widely comprehended simplicity and and domain specific detailed description.   The RDF for a resource can, and probably should, contain both.

So Library of Congress, I welcome your announcement and offer a friendly reminder that you not only need to draw expertise from the forward thinking library community, but also from the wider Linked Data world.  I am sure your partners from the British Library will reinforce this message.

This post was also published on the Talis Consulting Blog

W3C Library Linked Data Final Report Published

w3c_home The W3C Library Linked Data Incubator Group has published it’s Final Report after a year of deliberation.

The mission of the Library Linked Data Incubator Group was to help
increase the global interoperability of library data on the Web by
focusing on the potential role of Linked Data technologies.

This report contains several messages that are not just interesting and relevant for the Linked Data enthusiast in the library community. It contains some home truths for those in libraries who think that a slight tweak to the status quo, such as adopting RDA, will be sufficient to keep libraries [data] relevant in the rapidly evolving world of the web.

On the NGC4LIB mailing list, Eric Lease Morgan picked out some useful quotes from the report:

  • Linked Data is not about creating a different Web, but rather about enhancing the Web through the addition of structured data.
  • By promoting a bottom-up approach to publishing data, Linked Data creates an opportunity for libraries to improve the value proposition of describing their assets.
  • Linked Data may be a first step toward a “cloud-based” approach to managing cultural information, which could be more cost-effective than stand-alone systems in institutions.
  • With Linked Open Data, libraries can increase their presence on the Web, where most information seekers can be found.
  • The use of the Web and Web-based identifiers will make up-to-date resource descriptions directly citable by catalogers.
  • History shows that all technologies are transitory, and the history of information technology suggests that specific data formats are especially short-lived.
  • Library developers and vendors will directly benefit from not being tied to library-specific data formats.
  • Most information in library data is encoded as display-oriented, natural-language text.
  • Work on library Linked Data can be hampered by the disparity in concepts and terminology between libraries and the Semantic Web community.
  • Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data.
  • A major advantage of Linked Data technology is realized with the establishment of connections between and across datasets.
  • Libraries should embrace the web of information, both by making their data available for use as Linked Data and by using the web of data in library services. Ideally, library data should integrate fully with other resources on the Web, creating greater visibility for libraries and bringing library services to information seekers.

Also, from the report:

Relatively few bibliographic datasets have been made available as Linked Data, and even less metadata has been produced for journal articles, citations, or circulation data — information which could be put to effective use in environments where data is integrated seamlessly across contexts. Pioneering initiatives such as the release of the British National Bibliography reveal the effort required to address challenges such as licensing, data modeling, the handling of legacy data, and collaboration with multiple user communities. However, these also demonstrate the considerable benefits of releasing bibliographic databases as Linked Data. As the community’s experience increases, the number of datasets released as Linked Data is growing rapidly.

Talis Consulting has been closely and actively involved in the modelling, data transformation, publishing, and hosting of the British National Bibliography (BnB) as Linked Data.  A great overview of the approach taken to modelling of bibliographic data in a way that makes it easily compatible with the wider Web of Data, is provided by Tim Hodson in his post – British Library Data Model: Overview.  As can bee seen from their work, the modelling used for the BnB differs from the approach taken by many attempting to publish bibliographic data as Linked Data – it describes the resources (the books, authors, publishers, etc.)  as people, places, events, and things, as against attempting to represent the records that libraries keep about their stock of resources.

With intentions to release open library data specifically mentioning Linked Data, the sentiments from this report are already influencing the wider forward thinking library community.  I will leave the last word to the report’s final paragraph which some, in the traditional record-based cataloguing community, may have difficulty in getting their head around.  I encourage them to look at libraries from the point of view of the wider [non-library] web consumers, and read it again.

One final caveat: data consumers should bear in mind that, in contrast to traditional, closed IT systems, Linked Data follows an open-world assumption: the assumption that data cannot generally be assumed to be complete and that, in principle, more data may become available for any given entity. We hope that more “data linking” will happen in the library domain in line with the projects mentioned here.

This post was also published on the Talis Consulting Blog

Schema.org Déjà vu

schema-org1The Web has been around for getting on for a couple of decades now, and massive industries have grown up around the magic of making it work for you and your organisation.  Some of it, it has to be said, can be considered snake-oil.  Much of it is the output of some of the best brains on the planet.  Where, on the hit parade of technological revolutions to influence mankind, the Web is placed is oft disputed, but it is definitely up there with fire, steam, electricity, computing, and of course the wheel.  Similar debates, are and will virtually rage, around the hit parade of web features that will in retrospect have been most influential – pick your favourites, http, XML, REST, Flash, RSS, SVG, the URL, the href, CSS, RDF – the list is a long one.

I have observed a pattern as each of the successful new enhancements to the web have been introduced, and then generally adopted.  Firstly there is a disconnect between the proponents of the new approach/technology/feature and the rest of us.  The former split their passions between focusing on the detailed application, rules, and syntax of it’s use and; broadcasting it’s worth to the world, not quite understanding why the web masses do not ‘get it’ and adopt it immediately.  This phase is then followed by one of post-hype disillusionment from the creators, especially when others start suggesting simplifications to their baby.  Also at this time back-room adoption by those who find it interesting, but are not evangelistic about it, starts to occur.  The real kick for the web comes from those back-room folks who just use this next thing to deliver stuff and solve problems in a better way.  It is the results of their work that the wider world starts to emulate, so that they can keep up with the pack and remain competitive.  Soon this new feature is adopted by the majority, because all the big boys are using it, and it becomes just part of the tool kit.

A great example of this was RSS.  Not a technological leap but a pragmatic mix of current techniques and technologies mixed in with some lateral thinking and a group of people agreeing to do it in ‘this way’ then sharing it with the world.  As you will see from the Wikipedia page on RSS, the syntax wars raged in the early days – I remember it well 0.9, 0.91, 1.0, 1.1, 2.0- 2.01, etc.  I also remember trying, not always with success, to convince people around me to use it, because it was so simple.  Looking back it is difficult to say exactly when it became mainstream, but this line from Wikipedia gives me a clue: In December 2005, the Microsoft Internet Explorer team and Microsoft Outlook team announced on their blogs that they were adopting the feed icon first used in the Mozilla Firefox browser. In February 2006, Opera Software followed suit.  From then on, the majority of consumers of RSS were not aware of what they were using and it became just one of the web technologies you use to get stuff done.

I am now seeing the pattern starting to repeat itself again, with structured and linked data.  Many, including me, have been evangelising the benefits of web friendly, structured, linked data for some time now – preaching to a crowd that has been slow in growing, but growing it is.   Serious benefit is now being gained by organisations adopting these techniques and technologies, as our selection of case studiesdemonstrate.  They are getting on with it, often with our help, using it to deliver stuff.  We haven’t hit the mainstream yet.  For instance, the SEO folks still need to get their head around the difference between content and data.

Something is stirring around the edge of the Semantic Web/Linked Data community  that has the potential to give structured web enabled data the kick towards mainstream that RSS got when Microsoft adopted the RSS logo and all that came with it.   That something is schema.org, an initiative backed by the heavyweights of the search engine world, Google, Yahoo, and Bing.  For the SEO and web developer folks, schema.org offers a simple attractive proposition – embed some structured data in your html and, via things like Google’s Rich Snippets, we will give you a value added display in our search results.  Result, happy web developers with their sites getting improve listing display.  Result, lots of structured data starting to be published by people that you would have had an impossible task in convincing that it would be a good idea to publish structured data on the web.

I was at Semtech in San Francisco in June, just after schema.org was launched and caused a bit of a stir.  They’ve over simplified the standards that we have been working on for years, dumbing down RDF, diluting the capability, with to small a set of attributes, etc., etc.  When you get under the skin of schema.org, you see that with support for RDFa and supporting RDFa 1.1 lite, they are not that far from the RDF/Linked Data community.

Schema.org should be welcomed as an enabler for getting loads more structured and linked data on the web.  Is their approach now perfect,? No.  Will it influence the development of Linked Data? Yes.  Will the introduction be messy? Yes.  Is it about more than just rich snippets?  Oh yes.  Do the webmasters care at the moment? No.

If you want a friendly insight in to what schema.org is about, I suggest a listen to this month’s Semantic Link podcast, with their guest from Google/schema.org Ramanathan V. Guha.

Now where have I seen that name before? – Oh yes, back on the Wikipedia RSS pageThe basic idea of restructuring information about websites goes back to as early as 1995, when Ramanathan V. Guha and others in Apple Computer’s Advanced Technology Group developed the Meta Content Framework.”  So it probably isn’t just me who is getting a feeling of Déjà vu.

This post was also published on the Talis Consulting Blog

Will Europe’s National Libraries Open Data In An Open Way?

A significant step towards open bibliographic data was made in Copenhagen this week at the 25th anniversary meeting of the Conference of European National Librarians (CENL) hosted by the Royal Library of Denmark. From the CENL announcement:

…the Conference of European National Librarians (CENL), has voted overwhelmingly to support the open licensing of their data. What does that mean in practice? It means that the datasets describing all the millions of books and texts ever published in Europe – the title, author, date, imprint, place of publication and so on, which exists in the vast library catalogues of Europe – will become increasingly accessible for anybody to re-use for whatever purpose they want. The first outcome of the open licence agreement is that the metadata provided by national libraries to Europeana.eu, Europe’s digital library, museum and archive, via the CENL service The European Library, will have a Creative Commons Universal Public Domain Dedication, or CC0 licence. This metadata relates to millions of digitised texts and images coming into Europeana from initiatives that include Google’s mass digitisations of books in the national libraries of the Netherlands and Austria. ….it will mean that vast quantities of trustworthy data are available for Linked Open Data developments

There is much to be welcomed here. Firstly that the vote was overwhelming.   Secondly that the open license chosen to release this data under is Creative Commons CC0 thus enabling reuse for any purpose. You cannot expect such a vote to cover all the detail, but the phrase ‘trustworthy data are available for Linked Open Data developments’ does give rise to some possible concerns for me.   My concern is not from the point of view that this implies that the data will need to be published as Linked Data – this also should be welcomed. My concern comes from some of the library focused Linked Data conversations, presentations and initiatives I have experienced over the last few months and years. Many in the library community, that have worked with Linked Data, lean towards the approach of using Linked Data techniques to reproduce the very fine detailed structure and terminology of their bibliographic records as a representation of those records in RDF (Linked Data data format).  Two examples of this that come to mind:

  1. The recent release of an RDF representation of the MARC21 elements and vocabularies by MMA – Possibly of internal use only to someone transforming a library’s MARC record collection to identify concepts and entities to then describe as linked data.  Mind-numbingly impenetrable for anyone who is not a librarian looking for useful data.
  2. The Europeana Data Model (EDM).  An impressive and elegant Linked Data RDF representation of the internal record structure and process concerns of Europeana.  However again not modelled in a way to make it easy for those outside the [Europeana] library community to engage with, understand and extract meaning from.

The fundamental issue I have with the first of these and other examples is that their authors have approached this from the direction of wishing to encode their vast collections of bibliographic records as Linked Data.  Whereas they would have ended up with a more open [to the wider world] result if they had used the contents of their records as a rich resource from which to build descriptions of the resources they hold.  In that way you end up with descriptions of things (books, authors, places, publishers, events, etc.) as against descriptions of records created by libraries. Fortunately there is an excellent example of a national library publishing Linked Data which describe the things they hold.   The British Library have published descriptions of 2.6 million items they hold in the form of the British National Bibliography. I urge those within Europeana and the European National libraries community, who will be involved in this opening up initiative, to take a close look at the evolving data model that the BL have shared, to kick-start the conversation on the most appropriate [Linked Data] techniques to apply to bibliographic data.  For more detail see this Overview of the British Library Data Model. This opening up of data is a great opportunity for trusted librarian curated data to become a core part of the growing web of data, that should not be missed.  We must be aware of previous missed opportunities, such as the way XMLMarc just slavishly recreated an old structure in a new format.   Otherwise we could end up with what could be characterised, in web integration terms as, a significant open data white elephant. Nevertheless I am optimistic, with examples such as the British Library BnB backing up this enthusiastic move to open up a vast collection of metadata, in a useful way, that will stimulate Linked Data development, I have some confidence in a good outcome. Disclosure:Bibliographic domain experts from the British Library worked with Linked Data experts from the Talis team, in the evolution of the BnB data model – something that could be extended and or/repeated with other national and international library organisations.

This post was also published on the Talis Consulting Blog

The Power of the Link at Semantic Tech & Business, London, 2011

This post was initially just going to be about the presentation The Simple Power of the Link that I gave in the opening session of The Semantic Tech & Business Conference in London earlier this week.  However I realise now that it’s title, chosen to draw attention to the core utility and power of the basic links in Linked Data, has resonance and relevance for the conference as a whole.

This was the first conference in the long running Semtech series  to venture in to Europe as well as include the word business in it’s name.  This obviously recognises the move, from San Francisco based geekdom to global pragmatic usefulness, that the Semantic Web in general and Linked Data in particular is in the process of undertaking.  A maturing of the market that we in Talis Consulting can attest to having assisted many organisations with their understanding and adoption of Linked Data.   In addition to those attendees, that I would characterise as the usual suspects, the de-geeking of the topic attracted many who would not previously of visited such a conference.   An unscientific show of hands, prior to the session in which I presented, indicated that about half of the audience were new to this Semantic stuff.

Traffic on our stand in the exhibition area also supported this view, with most discussions being about how the techniques and technologies could be applied, and how Talis could help, as against esoteric details of the technologies themselves.  Linking people with real world issues and opportunities, with the people that have the experience to help them, was an obvious benefit of the event.  In addition to as having some great keynotes, as Rob described.

So, back to my initial purpose.

The presentation “The Simple Power of the Link” was an attempt to simplify the core principles of  Linked Data so as to highlight their implicit benefits.  The aforementioned geekdom that has surrounded the Semantic Web has, unfairly in my mind, gained the topic a reputation for being complex and difficult and hence not really applicable in the mainstream.  A short Google on the topic will rapidly turn up a set of esoteric research papers and discussions around things such as inferencing, content-negotiation and Sparql – a great way to put off those looking for an easy way in.

There is a great similarity to the way something like vehicle engineering is promoted with references to turbos, self levelling suspensions, flappy-paddle gearshifts, iPod docks and the like – missing the point for the [just landed on the planet] new to the topic – that the major utility of any vehicle is that it goes, stops, steers, and gets you from A to B.

The core utility of Linked Data is The Link.  A simple way to indicate the relationship between one thing and another thing.

As things in Linked Data are represented by http URIs, which when looked up should return you some data containing links to other things, an implicit web of relationships emerges that you can follow to obtain more related information. This basic, and powerful, utility could be simply realised with data encoded as triples in RDF, served from a simple file structure by a web server.

So, although things like triple stores, OWL, relational-to-RDF mapping tools, named graphs, SPARQL 1.1, and [the dreaded] httpRange-14 are important issues for those embedded in Linked Data, the overwhelming benefits that accrue from applying Linked Data come from those basic triples – the links.  As a community I believe that we can be rightly accused of not making that clear enough.  Something that my colleagues in Talis Consulting and I attempt to address whenever possible.  Especially at our open Linked Data events.

This post was also published on the Talis Consulting Blog

Web, Semantic Web, SEO, SERP and Linked Data

RDF Magnify Like many of my posts, this one comes from the threads of several disparate conversations coming together in my mind, in an almost astrological conjunction.

One thread stems from my recent Should SEO Focus in on Linked Data? post, in which I was concluding that the group, loosely described as the SEO community, could usefully focus in on the benefits of Linked Data in their quest to improve the business of the sites and organisations they support. Following the post I received an email looking for clarification of something I said.

I am interested in understanding better the allusion you make in this paragraph:

One of the major benefits of using RDFa is that it can encode the links to other sources, that is the heart of Linked Data principles and thus describe the relationships between things. It is early days with these technologies & initiatives. The search engine providers are still exploring the best way to exploit structured information embedded in and/or linked to from a page. The question is do you just take RDFa as a new way of embedding information in to a page for the search engines to pick up, or do you delve further in to the technology and see it as public visibility of an even more beneficial infrastructure for your data.

If the immediate use-case for RDFa (microdata, etc.) is search engine optimization, what is the “even more beneficial infrastructure”? If the holy grail is search engine visibility, rank, relevance and rich-results, what is the “even more”?

In reply I offered:

What I was trying to infer is that if you build your web presence on top of a Linked Data described dataset / way of thinking / platform, you get several potential benefits:

  • Follow-your-nose navigation
  • Flexible easier to maintain page structure
  • Value added data from external sources….
  • … therefore improved [user] value with less onerous cataloguing processes
  • Agile/flexible systems – easy to add/mix in new data
  • Lower cost of enhancement (eg. BBC added dinosaurs to the established Wildlife Finder with minimal effort)
  • In-built APIs [with very little extra effort] to allow others to access / build apps upon / use your data in innovative ways
  • As per the BBC a certain level of default SEO goodness
  • Easy to map, and therefore link, your categorisations to ones the engines do/may use (eg. Google are using MusicBrainz to help folks navigate around – if, say as the BBC do, you link your music categories to those of MusicBrainz you can share in that effect.

So what I am saying is that you can ‘just’ take RDFa as a dialect to send your stuff to the Google (in which case microdata/microformats could be equally as good), but then you will miss out on the potential benefits I describe.

From my point of view there are two holy grails (if that isn’t breaking the analogy 😉

  1. Get visibility and hence folks to hit your online resources.
  2. Provide the best experience/usefulness/value to them when they do.

Linked Data techniques and technologies, have great value for the data owners in the second of those, with the almost spin-off benefit of helping you with the first one.

The next thread was not a particular item but a general vibe, from several bits and pieces I read – that RDFa was confusing and difficult. This theme I detect was coming from those only looking at it from a ‘how do I encode my metadata for Google to grab it for it’s snippets’ point of view (and there is nothing wrong in that) or those trying to justify a ‘schema.org is the only show in town’ position. Coming at it from the first of those two points of view, I have some sympathy – those new to RDFa must feel like I do (with my basic understanding of html) when I peruse the contents of many a css file looking for clues as to the designer’s intention.

However I would make two comments. Firstly, a site surfacing lots of data and hence wanting to encode RDFa amongst the human-readable stuff, will almost certainly be using tools to format the data as it is extracted from an underlying data source – it is those tools that should be evolved to produce the RDFa as a by-product. Secondly, it is the wider benefits of Linked Data, which I’m trying to promote in my posts, that justify people investing in time to focus on it. The fact that you may use RDFa to surface that data embedded in html, so that search engines can pick it up, is implementation detail – important detail, but missing the point if that is all you focus upon.

Thread number three, is the overhype of the Semantic Web. Someone who I won’t name, but I’m sure won’t mind me quoting, suggested the following as the introduction to a bit of marketing: The Semantic Web is here and creating new opportunities to revamp and build your business.

The Semantic Web is not here yet, and won’t be for some while. However what is here, and is creating opportunities, is Linked Data and the pragmatic application of techniques, technologies and standards that are enabling the evolution towards an eventual Semantic Web.

This hyped approach is a consequence of the stance of some in the Semantic Web community who with fervour have been promoting it’s coming, in it’s AI entirety, for several years and fail to understand why all of us, [enthusiasts, researchers, governments, commerce and industry] are not implementing all of it’s facets now. If you have the inclination, you can see some of the arguments playing out now in this thread on a SemWeb email list where Juan Sequeda asks for support for his SXSW panel topic suggestion.

A simple request, that I support, but the thread it created shows that the ‘eating the whole elephant’ of the Semantic Web will be too much to introduce it successfully to the broad Web, SEO, SERP, community and the ‘one mouthful at a time’ approach may have better chance of success. Also any talk of a ‘killer app’ is futile – we are talking about infrastructure here. What is the killer app feature of the Web? You could say linked, globally distributed, consistently accessed documents; an infrastructure that facilitated the development of several killer businesses and business models. We will see the same when we look back on a web enriched by linked, globally distributed, consistently accessed data.

So what is my astrological conjunction telling me? There is definitely fertile ground to be explored between the Semantic Web and the Web in the area of the pragmatic application of Linked Data techniques and technologies. People in both camps need to open their minds to the motivations and vision of the other. There is potential to be realised, but we are definitely not in silver bullet territory.

As I said in my previous post, I would love to explore this further with folks from the world of SEO & SERP. If you want to talk through what I have described, I encourage you to drop me an email or comment on this post.

This post was also published on the Talis Consulting Blog

Will Government Open Licence Extensions be a haven for the timid?

National Archives announced today UK government licensing policy extended to make more public sector information available:

Building on the success of the Open Government Licence, The National Archives has extended the scope of its licensing policy, encouraging and enabling even easier re-use of a wider range of public sector information.

The UK Government Licensing Framework (UKGLF), the policy and legal framework for the re-use of public sector information, now offers a growing portfolio of licences and guidance to meet the diverse needs and requirements of both public sector information providers and re-user communities.

On the surface this is move is to to be welcomed.  Providing, amongst other things, licensing choices and guidance for re-using information free of charge for non-commercial purposes – the Non-Commercial Government Licence; guidance to licensing where charges apply and for the licensing of software and source code.

All this is available from the UK Government Licensing Framework area of the National Archives site, along with FAQs and other useful supporting information, including machine readable licenses.

As the press release says, the extensions are building on the success of the Open Government License(OGL) and are designed to cover what the OGL can not.

So the [data publishers] thought process should be to try to publish under the OGL and then, only if ownership/licensing/cost of production provide an overwhelming case to be more restrictive, utilise these extensions and/or guidance.

My concern, having listened to many questions at conferences from what I would characterise as government conservative traditionalists, is that many will start at the charge-for/non-commercial use end of this licensing spectrum because of the fear/danger of opening up data too openly.  I do hope my concerns are unfounded and that the use of these extensions will be the exception, with the OGL being the de facto licence of choice for all public sector data.

This post was also published on the Talis Consulting Blog

Should SEO Focus in on Linked Data?

RDF MagnifyIt is well known, the business of SEO is all about influencing SERPs, or is it?  Let me open up those acronyms:

Those engaged in the business of Search Engine Optimisation (SEO) focus much of their efforts on influencing Search Engine Result Pages (SERP), or more specifically the relevance and representation of their targeted items upon those pages.  As many a guide to SEO will tell you, some of this is simple – understanding the basics of how search engines operate, or even just purchasing the right advertising links on the SERP.  Quite simple in objective, but in reality an art form that attracts high rewards for those that are successful at it.

So if you want to promote links on search engine pages to your products, why would you be interested in Linked Data?  Well there are a couple of impacts that Linked Data, and RDF its data format, can have that are well worth looking into.

Delivering the Links – the BBC Wildlife Finder site is an excellent example of the delivering the links effect.

The BBC started with the data describing their video and audio clips and relating them to the animals they portray.  What was innovative in their approach was that they then linked to other information resources on the web, as against creating a catalogue of all that information in a database of their own.  This they encoded using Linked Data techniques, using RDF and a basic Wildlife Ontology that Talis consultants helped them develop and publish.   The stunningly visual website was then built on top of that RDF data, providing an intuitive navigational experience for users, delivering the follow-your-nose capability [that characterise Linked Data backed websites] to naturally move your focus between, animals, species, habitats, behaviours and xthe animals that relate to them.  Each of these pages having its own permanent web address (URI).  In a second innovative step they provided links to those external resources (eg. Wikipeadia – via dbpeadia, Animal Diversity Web, ARKive) on their pages to enable you to explore further.  duck pagecurlIn yet another innovation, they make that RDF data openly and easily available for each of the main pages.  (Checkout the source of the page you get when you add .rdf to the end of the URL for an animal page – not pretty, but machines love it)

So a stunning Linked Data backed site, with intuitive follow-your-nose internal navigation and links to external sites, but how is this good for SEO?  Because it behaves like a good website should.  The logical internal interlinks between pages, with a good URI structure that are not hidden in the depths of an obscure hierarchy, coupled with links out to to relevant, well respected [in SEO terms] pages is just what search engines look for.  The results are self evident – search for Lions, Badgers, Mallard Duck and many other animals on your favourite search engine and you will find BBC Nature appearing high in the results set.

Featured Entries – Getting your entry on the first SERP a user sees is obviously the prime objective of SEO, however making it stand out from the other entries on that page is an obvious secondary one.  The fact that ebay charges more for listing enhancements indicates there is value in listing promotion.

RDF, in the form of RDFa, and Linked Data become important in the field of Search Engine Results Promotion (another use of SERP) courtesy of something called Rich Snippets supported by Google, Microsoft, and Yahoo.  From Google:

Google tries to present users with the most useful and informative search results. The more information a search result snippet can provide, the easier it is for users to decide whether that page is relevant to their search. With rich snippets, webmasters with sites containing structured content—such as review sites or business listings—can label their content to make it clear that each labeled piece of text represents a certain type of data: for example, a restaurant name, an address, or a rating.

Encoding structured information about your product, review or business in [the html embeddable version of RDF] RDFa gives the search engine more information to display, that it otherwise would not be able to reliably infer by analysing the text on the page.   Take a look at these results for an item of furniture – see how the result with the reviews, from sears.com, stands out:

x

Elements such as pricing, availability, are also presented if you encode them in to your page.  I would be leading you astray if I gave you the impression that RDFa was the only way of encoding such information within your html.  Microformats, and Microdata now being boosted by the schema.org initiative, are other ways of encoding structured information on to your pages that the engines will recognise.

One of the major benefits of using RDFa is that it can encode the links to other sources, that is the heart of Linked Data principles and thus describe the relationships between things.  It is early days with these technologies & initiatives.  The search engine providers are still exploring the best way to exploit structured information embedded in and/or linked to from a page.   The question is do you just take RDFa as a new way of embedding information in to a page for the search engines to pick up, or do you delve further in to the technology and see it as public visibility of an even more beneficial infrastructure for your data.

At Talis we know the power of Linked Data and it’s ability to both liberate and draw in value to your data.  We have experience with it [in SEO terms] delivering the links and have an understanding of its potential for link featuring.

I would love to explore this further with folks from the world of SEO & SERP.  I also work alongside a team eager to investigate the possibilities with innovative organisations wanting to learn from the experience of the BBC, Best Buy, Sears and other first movers, and take things further.  If you fit either of those profiles, or just want to talk through what I have described, I encourage you to drop me an email or comment on this post.  There is much more to this than is currently being exploited and to answer the question in the title of this post – yes, those interested in SEO should be focusing in on Linked Data.

This post was also published on the Talis Consulting Blog

UK Government Commits to More Open Data

Print A couple of weeks back UK Prime Minister David Cameron announced the broadening of the publicly available government data with the publishing of key data on the National Health Service, schools, criminal courts and transport.

The background to the announcement was a celebration of the preceding year of activity in the areas of transparency and open data, with many core government data sets being published. Too many to list here, but the 7,200+ listed on data.gov.uk gives you an insight.  The political guide to this is undeniable, as Mr Cameron makes clear in his YouTube speech for the announcement “Information is power because it allows people to hold the powerful to account”

His “I believe it will also drive economic growth as companies can use this new data to build web sites or apps that allow people to access this information in creative ways” statement also gives an indication of the drivers for the way forward.

To be successful in either of these ambitions, the people and the companies have to have access to information in an easy and reliable way that gives them confidence to build their opinions and their business models upon.  What do we measure that ease and reliability against – is it the against the world of audited business practice, where the legal eagles and armchair auditors strive towards perfection, or is it the web world in which a lack of perfection is accepted with the norm of good-enough is good enough?  I believe that with government data, on the web, we should still accept that it will not be perfect, but the good-enough hurdle should be set higher than we would expect from the likes of Wikipedia and some other oft used data sources.

There are two mentions, in the words that accompany the announcement, that appear to recognise this. Firstly in the announcement itself on the Number 10 website, this: All of the new datasets will be published in an open standardised format so they can be freely re-used under the Open Government Licence by third parties. What ‘open standardised format’ actually means is something we need to delve in to, but previous data.gov.uk work towards Linked Data and shared reliable identifiers for things [such as postcodes, schools, and stations] bodes well.   Secondly in Mr Cameron’s letter to his Cabinet we get a section on improving data quality, including things like plain English descriptions of scope and purpose, introducing unique identifiers to help tracking of interactions with companies, and an action plan for improving quality and comparability of data.

So where are we now?  Some of the new data is not perfect, as this thread on the UK Government Data Developers Google Group, shows.  William Waltes, identifies that the [government] reporting of transactions with the Open Knowledge Foundation, do not match the transactions in the OKF’s own books, therefore calling in to question how reliable those [government] figures are.  In my opinion, this is an example where we should applaud the release of such new data but, with conversations such as the one William started, help those who are publishing the data to improve the quality, reliability and comparability of their output.  Of course by definition it means that the publishers are prepared and ready to listen – and are listening.

What we shouldn’t do is throw our hands in the air in despair because the first publishing of data by some departments is not up to what we would expect, or decry the move towards shared [URI based] identifiers because they look confusing in a csv file.  Data publishers will get better at it with helpful criticism.  I am also convinced that sharing well known reliable identifiers for things across desperate, government and non-government, data will in the medium term have a far greater benefit than most [including enthusiasts for Linked Data like me] can currently envisage.

 This post was also published on the Talis Consulting Blog

Significant Bibliographic Linked Data Release From The British Library

BL Door Today the British Library announced a significant contribution to the development, application, and sharing of bibliographic data using Linked Data techniques and technologies, with a preview of a new approach to publishing the British National Bibliography.

Chief Executive Dame Lynne Brindley announced the initiative in her Keynote at Linked Data and Libraries 2011, hosted by Talis at the British Library (BL) in London.

The Metadata Services Team at BL have been working with Tim Hodson, and other Talis consultants, over the last few months to apply Linked Data modelling practices to bibliographic resources. The British National Bibliography (BNB) of 2.8 million titles was used as the basis for this work. This differs from their first bibliographic publishing initiative using RDF in that they did not set out solely to encode their collection of Marc records into RDF/XML, as they and several other libraries have done previously.

Instead, building on training and workshops with the Talis team, they set out to model ‘things of interest’, such as people, places and events, related to a book you might hold in your hand. Although they were constantly aware of the value of the BNB data, they did not want to be constrained by the format and practices that went into it’s creation over many years.

Following Linked Data practices adopted across the web by governments, business and academia, they modelled these ‘interesting things’, reusing as many existing descriptive schema as possible. They were keen to reuse the likes of foaf, dc-terms and skos. However there were some areas for which there was no suitable existing property with the right meaning.  To fill this gap, a set of British Library Terms (BLT) was developed and will be published alongside the data.

British Library Data Model As you can see from the model (click for a larger view) also published today, they have gone a long way down that road.  As all involved agree, this is the beginning of a journey and there is much more to do.

To support the model, the BL has released a subset of data (books published or distributed in the UK since 2005) published under an open CC0 license, hosted in the Talis Platform, giving access to a SPARQL Service, a Search Service, and a Describe Service. Through Linked Data Views, data for resources are available in both html and several rdf formats (rdf/xml, turtle, json) for programmatic access. The intention is to build to a full representation of the BNB, including serial publications, multi-part works and future publications, as data becomes available.

bnb record The follow-your-nose linking effect associated with Linked Data is very apparent when clicking through from example resources (eg. http://bnb.data.bl.uk/id/resource/006893251).  For instance, note the path you follow when clicking on the publication event, bringing together time, place and agent as individual resources. Try following the links through to creator, the BL resource being identified as being the same as the viaf identity 37001878.

To paraphrase Neil Wilson, who leads the BL Metadata Services Team, this preview represents a work in progress that is hopefully sufficiently developed to be of interest across the wider web of data for audiences wanting to interact with bibliographic data. It also represents a significant contribution to a conversation underway in the library world seeking to publish our valuable resources in a useful and easily consumed way.  We hope others will join the conversation.

Neil, and Talis’ Rob Styles are also presenting at Linked Data and Libraries, explaining the journey of the project and the philosophy and approach to building the model.  Check out the event site where their presentations will be made available.

Picture of British Library entrance by Xavier de Jauréguiberry on Flikr

This post was also published on the Talis Consulting Blog