Robust Links - Link Decoration

Last updated: April 24, 2015


Authored by:
   Herbert Van de Sompel - Los Alamos National Laboratory
   Harihar Shankar - Los Alamos National Laboratory
   Richard Wincewicz - University of Edinburgh
   Michael L. Nelson - Old Dominion University

Abstract

This document describes approaches to decorate links in HTML pages as a means to make them more robust over time. The approach assumes that, when linking to a web resource, a snapshot of the state of that resource is created, for example, in a web archive or a versioning system. When linking, the URI of the resource, the URI of the snapshot, and the datetime of linking are conveyed. Doing so increases the possibilities for revisiting the originally linked content a long time after the link was put in place.

Table of Contents

1. Why decorate links?

Link decoration is intended to make links more robust over time. Check out this document that describes the rationale for link decoration.

2. The link decoration approach

The approach assumes that, when linking to a web resource, a snapshot of the state of that resource is created, for example, in a web archive or a versioning system. Several web archives provide services that allow taking such snapshots, and versioning systems take them automatically. With a snapshot taken, a link can be made more robust by including:
  • The URI of the original resource for which the snapshot was taken;
  • The URI of the snapshot;
  • The datetime of linking, of taking the snapshot.
This information, when provided in a machine-actionable manner, allows:
  • Visiting the snapshot;
  • Revisiting the original resource some time after linking;
  • Finding snapshots that are temporally close the one taken in case the snapshot itself becomes temporally or permanently inaccessible.
The approach proposed here is to convey this information on a link by leveraging HTML5's attribute extensibility mechanism. It introduces the following data- attributes for the anchor (<a>) element:
  • data-originalurl for the URI of the original resource;
  • data-versionurl for the URI of the snapshot if one was taken;
  • data-versiondate for the datetime of linking, of taking the snapshot.
The remainder of this document details how to use these attributes for various cases.

3. Decorating a link when linking to the original resource

If the main intent is to link to an original resource, but to, in addition, allow future users of that link to see the state of the original resource around the time the link was put in place, then information is conveyed as follows:
  • href for the URI of the original resource for which the snapshot was taken;
  • data-versionurl for the URI of the snapshot;
  • data-versiondate for the datetime of linking, of taking the snapshot.
Assume that, on January 21 2015, I created a robust link to http://www.w3.org/.
  • In case I created a snapshot of the resource, my robust link to the W3C home page would look as follows:
    <a href="http://www.w3.org/"
       data-versionurl="https://archive.today/r7cov"
       data-versiondate="2015-01-21">my robust link to the W3C home page</a>
    
  • In case I did not create a snapshot of the resource, my robust link to the W3C home page would look as follows:
    <a href="http://www.w3.org/"
       data-versiondate="2015-01-21">my robust link to the W3C home page</a>
    

4. Decorating a link when linking to a specific version

If the main intent is to link to a specific state of an original resource, for example a snapshot of the original resource in a web archive or one of its version in a version control system, then information is conveyed as follows:
  • href for the URI that provides the specific state, i.e. the snapshot or resource version;
  • data-originalurl for the URI of the original resource;
  • data-versiondate for the datetime of the snapshot, of the resource version.
For example,
  • Assume that, on January 21 2015, I created a robust link that was primarily intended to convey the state of http://www.w3.org/ on that day. In order to do so, I created the snapshot https://archive.today/r7cov. In this case, my robust link to this specific version of the W3C home page would look as follows:
    <a href="https://archive.today/r7cov"
       data-originalurl="http://www.w3.org/"
       data-versiondate="2015-01-21">my robust link to this specific version of the W3C home page</a>
    
  • Assume that, on January 21 2015, I created a robust link that was primarily intended to point to the version of http://en.wikipedia.org/wiki/Web_archiving that was operational on that day, which is http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880. In this case, my robust link to this specific version of the Wikipedia page would look as follows:
    <a href="http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880"
       data-originalurl="http://en.wikipedia.org/wiki/Web_archiving"
       data-versiondate="2015-01-21">my robust link to this specific version of the Wikipedia page</a>
       
  • Assume that, on January 21 2015, I created a robust link that was primarily intended to point to the version of http://en.wikipedia.org/wiki/Web_archiving that was operational on March 20 2012, which is http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845. In this case, my robust link to this specific version of the Wikipedia page would look as follows:
    <a href="http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845"
       data-originalurl="http://en.wikipedia.org/wiki/Web_archiving"
       data-versiondate="2012-03-20">my robust link to this specific version of the Wikipedia page</a>

5. Don't want to decorate links: provide page creation and modification date

So, you're not sure about this link decoration thing. And maybe you don't even want to take snapshots of pages. But you still would links to be more robust. You can make a difference by providing the creation and modification date of your page in a machine-actionable manner. This will make it possible to interpret every link in the page subject to those dates, if so desired. It will make it possible to automatically find snapshots of linked resources from around the time of page creation or modification in web archives or versioning systems.

The proposed approach is based on the schema.org properties for page creation and modification:
  • datePublished for the date of first publication of the page;
  • dateModified for the date of most recent modification of the page.
For example, this very page http://robustlinks.mementoweb.org/spec/ was first published on January 23 2015. This is conveyed at the top of the HTML of the page as follows:
<!DOCTYPE html>  
<html lang="en" itemscope itemtype="http://schema.org/WebPage" itemid="http://robustlinks.mementoweb.org/spec/">
<head>
  <meta charset="utf-8" />
  <meta itemprop="datePublished" content="2015-01-23">
  <title>Robust Links - Link Decoration</title>
Also, in order to additionally convey that the page was most recently updated on February 2 2015, the following is conveyed at the top of the HTML:
<!DOCTYPE html>  
<html lang="en" itemscope itemtype="http://schema.org/WebPage" itemid="http://robustlinks.mementoweb.org/spec/">
<head>
  <meta charset="utf-8" />
  <meta itemprop="dateModified" content="2015-02-02">
  <meta itemprop="datePublished" content="2015-01-23">
  <title>Robust Links - Link Decoration</title>