Robust Links - Motivation

Last updated: July 10, 2015

Links on the web are brittle. A way to make them more robust over time is to decorate them.

There are really two problems:
  • Link rot: Following a link yields a "404 Not Found" error message.
  • Content drift: The content at the end of the link changes over time, possibly to the point where it loses all similarity with the originally linked content.
In order to address this problem, those that care about link robustness resort to the following strategy:
  • When linking to a web page, a snapshot of the state of the page at linking time is created in a web archive. Several web archives provide on-demand snapshot functionality, including the Internet Archive, archive.today, and perma.cc.
  • With the snapshot created, rather than linking to the original web page, a link to the snapshot is put in place.
For example, I am writing this web page on January 21 2014. And I want to link to http://www.w3.org/. That's the W3C's page, and it changes rather frequently. In order for future readers of my page to see the same W3C content that I saw when linking, I create a snapshot, say in the archive.today web archive. That snapshot is at https://archive.today/r7cov and, rather than linking to http://www.w3.org/, I link to https://archive.today/r7cov.

While the creation of the snapshot is definitely an essential step in the right direction, there are problems with the linking approach:
  • Linking to the snapshot https://archive.today/r7cov, assumes that the archive.today web archive will exist forever. Unfortunately, there are already plenty of indications that web archives do not have eternal life either. If the web archive in which I created the snapshot suffers a temporary glitch, moves its content to another web location, or ceases to exist, visiting the snapshot becomes impossible.
  • When linking to the snapshot https://archive.today/r7cov, the URI of the W3C's page, http://www.w3.org/, is lost. As a result, future readers of my page cannot visit the W3C page to see its evolved state.
Link decoration is a way to address these problems and to increase the chances that links will lead to meaningful content, even a long time after they were put in place. In order to maximize link robustness, the following information should be available, in a machine-actionable manner, for a link:
  • The URI of the snapshot, in our example https://archive.today/r7cov;
  • The URI of the original resource, in our example http://www.w3.org/;
  • The datetime of linking, in our example January 21 2015.
The latter two information elements can be used to automatically find snapshots in other web archives in case archive.today's service is interrupted, and the snapshot https://archive.today/r7cov becomes inaccessible as a result.

Link decorations are conveyed in a manner that leverages HTML5's attribute extensibility mechanism. Using that approach, this decorated link to the W3C home page is expressed as follows:
<a href="http://www.w3.org/"
   data-versionurl="https://archive.today/r7cov"
   data-versiondate="2015-01-21">this decorated link to the W3C home page</a>
As can be seen by clicking the arrow that appears next to the above link, link decorations can be made actionable, for example, by injecting the robustlinks JavaScript and CSS code in an HTML page. What results are Robust Links that can be followed if the original link no longer works or does not yield the expected information. Note that not only link decorations can be used to generate Robust Links but also the page creation and/or modification dates. Details are in the Link Decoration document.

An insight in the bigger picture that motivates the Robust Links work is provided in the below slide deck. It addresses the creation of pockets of persistence on the web.