Robustifying Links

Last updated: June 29, 2020


Authored by:
   Martin Klein - Los Alamos National Laboratory
   Shawn M. Jones - Los Alamos National Laboratory
   Harihar Shankar - Los Alamos National Laboratory
   Richard Wincewicz - University of Edinburgh
   Michael L. Nelson - Old Dominion University
   Herbert Van de Sompel - Data Archiving and Networking Services (DANS)

Abstract

Robust Links provide multiple pathways to revisit a link's original content, even a long time after the link was put in place. This document describes approaches to robustify links in HTML pages. All approaches assume that, when linking to a web resource, a snapshot of the state of that resource is created, for example, in a web archive or a versioning system. When linking, the URI of the resource, the URI of the snapshot, and the datetime of linking are conveyed.

Table of Contents

1. Why robustify links?

Robust Links are intended provide multiple pathways to revisit a link's original content over time. Check out this document that describes the motivation and rationale for robustifying HTML links.

2. The Robust Links approach

The approach assumes that, when linking to a web resource, a snapshot of the state of that resource is created, for example, in a web archive or a versioning system. Several web archives provide services that allow taking such snapshots, and versioning systems take them automatically. With a snapshot taken, a link can be robustified by including:
  • The URI of the original resource for which the snapshot was taken;
  • The URI of the snapshot;
  • The datetime of linking, of taking the snapshot.
This information, when provided in a machine-actionable manner, allows:
  • Visiting the snapshot;
  • Revisiting the original resource some time after linking;
  • Finding snapshots that are temporally close the one taken, in case the snapshot itself becomes temporally or permanently inaccessible.
The approach proposed here is to convey this information on a link by leveraging HTML5's attribute extensibility mechanism. It introduces the following data- attributes for the anchor (<a>) element:
  • data-originalurl for the URI of the original resource;
  • data-versionurl for the URI of the snapshot;
  • data-versiondate for the datetime of linking, of taking the snapshot.
The remainder of this document details how to use these attributes for various cases.

3. Robustifying a link when linking to the original resource

If the main intent is to link to an original resource but also allow future users of that link to see the state of the original resource around the time the link was put in place, then Robust Link information is conveyed as follows:
  • href for the URI of the original resource for which the snapshot was taken;
  • data-versionurl for the URI of the snapshot;
  • data-versiondate for the datetime of linking, of taking the snapshot.
For example, assume that we created a Robust Link to http://www.w3.org/ on January 21 2015.
  • I created a snapshot of the resource as recommended and my Robust Link to the W3C home page looks like this:
    <a href="http://www.w3.org/"
       data-versionurl="https://archive.today/r7cov"
       data-versiondate="2015-01-21">Robust Link to the W3C home page</a>
    
  • In case I did not create a snapshot of the resource my Robust Link to the W3C home page would look like this:
    <a href="http://www.w3.org/"
       data-versiondate="2015-01-21">Robust Link to the W3C home page</a>
    

4. Robustifying a link when linking to a specific version

If the main intent is to link to a specific state of an original resource, for example a snapshot of the original resource in a web archive or one of its version in a version control system, then Robust Link information is conveyed as follows:
  • href for the URI that provides the specific state i.e., the snapshot or resource version;
  • data-originalurl for the URI of the original resource;
  • data-versiondate for the datetime of the snapshot or resource version.
For example,
  • Assume that we created a Robust Link on January 21 2015 that was primarily intended to convey the state of http://www.w3.org/ on that day. In order to do so, we created the snapshot https://archive.today/r7cov. In this case my Robust Link to this specific version of the W3C home page looks like this:
    <a href="https://archive.today/r7cov"
       data-originalurl="http://www.w3.org/"
       data-versiondate="2015-01-21">Robust Link to this specific version of the W3C home page</a>
    
  • Assume that we created a Robust Link on January 21 2015 that was primarily intended to point to the version of http://en.wikipedia.org/wiki/Web_archiving that was operational on that day, which is http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880. In this case my Robust Link to this specific version of the Wikipedia page looks like this:
    <a href="http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=637465880"
       data-originalurl="http://en.wikipedia.org/wiki/Web_archiving"
       data-versiondate="2015-01-21">Robust Link to this specific version of the Wikipedia page</a>
       
  • Assume that we created a Robust Link on January 21 2015 that was primarily intended to point to the version of http://en.wikipedia.org/wiki/Web_archiving that was operational on March 20 2012, which is http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845. In this case my Robust Link to this specific version of the Wikipedia page looks like this:
    <a href="http://en.wikipedia.org/w/index.php?title=Web_archiving&oldid=485347845"
       data-originalurl="http://en.wikipedia.org/wiki/Web_archiving"
       data-versiondate="2012-03-20">Robust Link to this specific version of the Wikipedia page</a>

5. Acknowledgements

We are greatful to a number of unnamed colleagues for their feedback to this document.