Links on the web break all the time, robustify them!
There are really two problems:
- Link rot: Following a link yields a "404 Not Found" error message.
- Content drift: The content at the end of the link changes over time,
possibly to the point where it loses all similarity with the originally
linked content.
In order to address these problems, a common approach is to:
- Create a snapshot of the page in a publicly accessible web archive.
Several web archives provide on-demand snapshot functionality,
including the Internet Archive,
archive.today, and
perma.cc.
- Link to the snapshot rather than to the original web page.
For example, we wrote the first version of this web page on January 21 2015 and wanted to link to
http://www.w3.org/
. This page changes rather frequently and in order for future readers
to see the same W3C content that we saw when linking, we created a snapshot in the
archive.today web archive.
That snapshot is at
https://archive.today/r7cov and, rather than linking to
http://www.w3.org/
, we link to
https://archive.today/r7cov
.
While the creation of the snapshot is definitely an essential step in the right direction,
there are problems with this approach:
- Linking to the snapshot
https://archive.today/r7cov
,
assumes that the archive.today web archive will exist forever.
Unfortunately, not even web archives are free from technical, longevity, and sustainability issues and so we can not
assume they will live forever. If the web archive in which we created the snapshot suffers a temporary glitch, moves
its content to another web location, or ceases to exist, visiting the snapshot becomes impossible.
- When linking to the snapshot
https://archive.today/r7cov
,
the URI of the W3C's page (http://www.w3.org/
) is lost.
As a result, future readers of this page cannot visit the W3C page
to see its evolved state.
Robustifying your links addresses these problems. It increases the chances
that links will lead to meaningful content, even long after they were put in place.
The following three pieces of information robustify a link in a machine-actionable manner:
- The URI of the original resource, in our example
http://www.w3.org/
;
- The URI of the snapshot, in our example
https://archive.today/r7cov
;
- The date of linking, in our example
January 21 2015
.
With these three pieces of information, we can:
- access the live web version of the page, if available, or
- visit the archived snapshot, if the live version is not available or its content has drifted, or
- use the date of linking and the URI of the original resource to find automatically find another archived snapshot
in another web archive, if archive.today's service is interrupted or the snapshot
becomes inaccessible.
To create a Robust Link, we make use of
HTML5's attribute extensibility mechanism and add two new elements to existing links. The HTML for
our W3C home page Robust Link is shown below:
<a href="http://www.w3.org/"
data-versionurl="https://archive.today/r7cov"
data-versiondate="2015-01-21">our W3C home page Robust Link</a>
The arrow next to the link provides a menu to view the original resource, its snapshot, or snapshots from other web archives.
This menu showcases one way how Robust Links can be made actionable. It becomes available after injecting the
Robust Links
JavaScript DOI and
CSS DOI
into a web page.
More details on how to create Robust Links are available in the
Robustifying Links document.
The slide deck below provides insight into the bigger picture that motivates Robust Links.
It addresses the creation of pockets of persistence on the web.