Robust Links - Motivation

Last updated: June 29, 2020

Links on the web break all the time, robustify them!

There are really two problems:
  • Link rot: Following a link yields a "404 Not Found" error message.
  • Content drift: The content at the end of the link changes over time, possibly to the point where it loses all similarity with the originally linked content.
In order to address these problems, a common approach is to:
  • Create a snapshot of the page in a publicly accessible web archive. Several web archives provide on-demand snapshot functionality, including the Internet Archive, archive.today, and perma.cc.
  • Link to the snapshot rather than to the original web page.
For example, we wrote the first version of this web page on January 21 2015 and wanted to link to http://www.w3.org/. This page changes rather frequently and in order for future readers to see the same W3C content that we saw when linking, we created a snapshot in the archive.today web archive. That snapshot is at https://archive.today/r7cov and, rather than linking to http://www.w3.org/, we link to https://archive.today/r7cov.

While the creation of the snapshot is definitely an essential step in the right direction, there are problems with this approach:
  • Linking to the snapshot https://archive.today/r7cov, assumes that the archive.today web archive will exist forever. Unfortunately, not even web archives are free from technical, longevity, and sustainability issues and so we can not assume they will live forever. If the web archive in which we created the snapshot suffers a temporary glitch, moves its content to another web location, or ceases to exist, visiting the snapshot becomes impossible.
  • When linking to the snapshot https://archive.today/r7cov, the URI of the W3C's page (http://www.w3.org/) is lost. As a result, future readers of this page cannot visit the W3C page to see its evolved state.
Robustifying your links addresses these problems. It increases the chances that links will lead to meaningful content, even long after they were put in place. The following three pieces of information robustify a link in a machine-actionable manner:
  • The URI of the original resource, in our example http://www.w3.org/;
  • The URI of the snapshot, in our example https://archive.today/r7cov;
  • The date of linking, in our example January 21 2015.
With these three pieces of information, we can:
  • access the live web version of the page, if available, or
  • visit the archived snapshot, if the live version is not available or its content has drifted, or
  • use the date of linking and the URI of the original resource to find automatically find another archived snapshot in another web archive, if archive.today's service is interrupted or the snapshot becomes inaccessible.

To create a Robust Link, we make use of HTML5's attribute extensibility mechanism and add two new elements to existing links. The HTML for our W3C home page Robust Link is shown below:
<a href="http://www.w3.org/"
   data-versionurl="https://archive.today/r7cov"
   data-versiondate="2015-01-21">our W3C home page Robust Link</a>
The arrow next to the link provides a menu to view the original resource, its snapshot, or snapshots from other web archives. This menu showcases one way how Robust Links can be made actionable. It becomes available after injecting the Robust Links JavaScript DOI and CSS DOI into a web page.
More details on how to create Robust Links are available in the Robustifying Links document.

The slide deck below provides insight into the bigger picture that motivates Robust Links. It addresses the creation of pockets of persistence on the web.