Links on the web break all the time, robustify them!
There are really two problems:
- Link rot: Following a link yields a "404 Not Found" error message.
- Content drift: The content at the end of the link changes over time,
possibly to the point where it loses all similarity with the originally
In order to address these problems, a common approach is to:
- Create a snapshot of the page in a publicly accessible web archive.
Several web archives provide on-demand snapshot functionality,
including the Internet Archive,
- Link to the snapshot rather than to the original web page.
For example, we wrote the first version of this web page on January 21 2015 and wanted to link to
. This page changes rather frequently and in order for future readers
to see the same W3C content that we saw when linking, we created a snapshot in the
That snapshot is at
and, rather than linking to
, we link to
While the creation of the snapshot is definitely an essential step in the right direction,
there are problems with this approach:
- Linking to the snapshot
assumes that the archive.today web archive will exist forever.
Unfortunately, not even web archives are free from technical, longevity, and sustainability issues and so we can not
assume they will live forever. If the web archive in which we created the snapshot suffers a temporary glitch, moves
its content to another web location, or ceases to exist, visiting the snapshot becomes impossible.
- When linking to the snapshot
the URI of the W3C's page (
http://www.w3.org/) is lost.
As a result, future readers of this page cannot visit the W3C page
to see its evolved state.
Robustifying your links addresses these problems. It increases the chances
that links will lead to meaningful content, even long after they were put in place.
The following three pieces of information robustify a link in a machine-actionable manner:
- The URI of the original resource, in our example
- The URI of the snapshot, in our example
- The date of linking, in our example
January 21 2015.
With these three pieces of information, we can:
- access the live web version of the page, if available, or
- visit the archived snapshot, if the live version is not available or its content has drifted, or
- use the date of linking and the URI of the original resource to find automatically find another archived snapshot
in another web archive, if archive.today's service is interrupted or the snapshot
To create a Robust Link, we make use of
HTML5's attribute extensibility mechanism
and add two new elements to existing links. The HTML for
our W3C home page Robust Link
is shown below:
data-versiondate="2015-01-21">our W3C home page Robust Link</a>
The arrow next to the link provides a menu to view the original resource, its snapshot, or snapshots from other web archives.
This menu showcases one way how Robust Links can be made actionable. It becomes available after injecting the
into a web page.
More details on how to create Robust Links are available in the Robustifying Links
The slide deck below provides insight into the bigger picture that motivates Robust Links.
It addresses the creation of pockets of persistence on the web.