Robust Links - API Documentation

Last updated: May 20, 2020

The Robust Links API allows machine clients to create their own Robust Links. For example, a web site administrator can write a script that uses the API to update existing web pages to support Robust Links. Alternatively, a Content Management System (CMS) might call the API when rending a published page.

The API accepts a URL as input and returns the robustified link for that URL. If the URL of a live web resource (or original resource, URI-R) is submitted the Robust Links API automatically directs a web archive to capture that resource and create an archival copy, a Memento. The resulting URL of the Memento (URI-M) is received by the API and included in the robustified link.

The API generates two Robust Links. One contains the original resource URL (URI-R) as the default link and another contains the Memento URI-M as the default link. The client can acquire the HTML of the desired Robust Link by choosing the appropriate JSON key.


1. Quick Start With cURL

Generating Robust Links for a URL

The Robust Link API works via the HTTP GET method. Arguments are supplied via the query string. The first argument supported is url which specifies the URL of the Robust Link. We can ask the API to create a Robust Link for https://abcnews.go.com by issuing the curl command below. The resulting JSON contains our Robust Links in the keys memento_url_as_href and original_url_as_href.

curl "https://robustlinks.mementoweb.org/api/?url=https%3A%2F%2Fabcnews.go.com"
{
    "anchor_text": null,
    "api_version": "0.8",
    "data-originalurl": "https://abcnews.go.com",
    "data-versiondate": "2020-05-20",
    "data-versionurl": "https://web.archive.org/web/20200520195926/https://abcnews.go.com/",
    "memento-datetime": "Wed, 20 May 2020 19:59:26 GMT",
    "request_url": "https://abcnews.go.com",
    "request_url_resource_type": "original-resource",
    "robust_links_html": {
        "memento_url_as_href": "<​a href=\"https://web.archive.org/web/20200520195926/https://abcnews.go.com/\"\ndata-originalurl=\"https://abcnews.go.com\"\ndata-versiondate=\"2020-05-20\">Robust Link for https://web.archive.org/web/20200520195926/https://abcnews.go.com/<​a>",
        "original_url_as_href": "<​a href=\"https://abcnews.go.com\"\ndata-versionurl=\"https://web.archive.org/web/20200520195926/https://abcnews.go.com/\"\ndata-versiondate=\"2020-05-20\">Robust Link for https://abcnews.go.com<​/a>"
    }
}

The API user must decide which default link target to share with their readers. If the user wants their readers to click the anchor text and reach the original resource by default, then they choose original_url_as_href. Likewise, if the API user wants their readers to click the anchor text and reach the Memento by default, then they choose memento_url_as_href.

Specifying the anchor text

The API returns the default anchor text of Robust Link for [URL] where [URL] is the URL submitted by the client. To save API developers the effort of parsing and altering the Robust Link's HTML, we provide the ability to include desired anchor textvia the anchor_text parameter included in the query string. The example below uses cURL's -G argument to specify the GET method and the --data-urlencode argument to specify and encode the query string parameters. This combination of arguments takes care of converting all query string parameters to their URL-encoded formats and appends the query string to the URL.

curl -G --data-urlencode "url=https://abcnews.go.com" --data-urlencode "anchor_text=ABC News for May 20, 2020" https://robustlinks.mementoweb.org/api/
{
  "anchor_text": "ABC News for May 20, 2020",
  "api_version": "0.8",
  "data-originalurl": "https://abcnews.go.com",
  "data-versiondate": "2020-05-20",
  "data-versionurl": "https://web.archive.org/web/20200520201103/https://abcnews.go.com/",
  "memento-datetime": "Wed, 20 May 2020 20:11:03 GMT",
  "request_url": "https://abcnews.go.com",
  "request_url_resource_type": "original-resource",
  "robust_links_html": {
      "memento_url_as_href": "<​a href=\"https://web.archive.org/web/20200520201103/https://abcnews.go.com/\"\ndata-originalurl=\"https://abcnews.go.com\"\ndata-versiondate=\"2020-05-20\">ABC News for May 20, 2020<​/a>",
      "original_url_as_href": "<​a href=\"https://abcnews.go.com\"\ndata-versionurl=\"https://web.archive.org/web/20200520201103/https://abcnews.go.com/\"\ndata-versiondate=\"2020-05-20\">ABC News for May 20, 2020<​/a>"
  }
}

On line 1, the user encodes the url https://abcnews.go.com and the anchor text ABC News for May 5, 2020 with the --data-urlencode argument. The URL encoding is necessary to ensure the clean transmission of this input.

On lines 12 and 13, the user can extract the relevant Robust Link from the output. It is different from the previous example; it now contains the requested anchor text.


2. Recipes for Developers


The Robust Links API provides the information necessary for machine clients to construct their own Robust Links. Clients can construct their own robust links by following this procedure:
  1. URL-encode the submitted URL and append it to the url query string variable (e.g., https://abcnews.go.com becomes url=https%3A%2F%2Fabcnews.go.com)
  2. URL-encode the anchor text and append it to the anchor_text query string variable (e.g., ABC News for May 20, 2020 becomes anchor_text=ABC%20News%20for%20May%2020%2C%202020)
  3. construct the full URL for the request to the API by concatenating these parameters, separated by an &, to https://robustlinks.mementoweb.org/api/? (e.g., https://robustlinks.mementoweb.org/api/?url=https%3A%2F%2Fabcnews.go.com&anchor_text=ABC%20News%20for%20May%2020%2C%202020)
  4. issue an HTTP GET to this URL
  5. process the returned JSON
Below, we provide simplified examples in different programming languages for completing this process. The examples assume that the user wants to create a Robust Link from https://abcnews.go.com with the anchor text ABC News from May 20, 2020 and use the original URL (URI-R) of https://abcnews.go.com as the default link target. Where possible, these examples require no external libraries. Click the tab below to view an example in the desired language.

import json
import urllib

url = "https://abcnews.go.com/"

anchor_text = "ABC News for May 20, 2020"

query_string = urllib.parse.urlencode({ 'anchor_text': anchor_text, 'url': url })

api_url = "https://robustlinks.mementoweb.org/api/?" + query_string

response = urllib.request.urlopen(url=api_url)

json_data = json.loads(response.read())

print(json_data['robust_links_html']['original_url_as_href'])

This example displays how one would complete this process with Python.

On line 8, we encode the anchor text and url as a query string.

Line 12 demonstrates how we issue the HTTP GET request with this data.

Lines 14 - 16 extract the Robust Link HTML from the JSON response and print out Robust Link as a string.

This prints HTML output where the original resource URL is the value assigned to the href attribute:

<a href="https://abcnews.go.com/"
  data-versionurl="https://web.archive.org/web/20200520203814/https://abcnews.go.com/"
  data-versiondate="2020-05-20">ABC News for May 20, 2020</a>

Which is rendered by the browser, as shown below. If a reader clicks on the Robust Links menu to the right, they can choose to visit the live web resource, the Memento, or other Mementos for this resource. If the reader clicks on the anchor text ABC News from May 20, 2020 the browser delivers them to the original resource as it currently exists.

ABC News for May 20, 2020

If the user wishes to use the Memento URL as the link target, then they can replace line 16 with the following:

print(json_data['robust_links_html']['memento_url_as_href'])

This prints HTML output where the Memento URL is the value assigned to the href attribute:

<a href="https://web.archive.org/web/20200520203814/https://abcnews.go.com/"
  data-originalurl="https://abcnews.go.com/"
  data-versiondate="2020-05-20">ABC News for May 20, 2020</a>

Which is rendered by the browser, as shown below. If a reader clicks on the Robust Links menu to the right, they can choose to visit the live web resource, the Memento, or other Mementos for this resource. If the reader clicks on the anchor text ABC News from May 20, 2020 the browser delivers them to the Memento for this resource captured by archive.org at 2020-04-14T14:53:54.

ABC News for May 20, 2020

Though the anchor text is the same, and the two links look the same to the reader, they deliver the reader to different destinations. This gives the page author control over which resource the reader reaches by default. In addition, the Robust Links menu to the right provides them with additional options if they wish to visit another version than what is specified in the default.

require 'net/http'
require 'json'
require 'uri'

url = "https://abcnews.go.com/"

anchor_text = "ABC News for May 20, 2020"

api_url = URI('https://robustlinks.mementoweb.org/api/?')

api_url.query = URI.encode_www_form( { :url => url, :anchor_text => anchor_text } )

res = Net::HTTP.get_response(api_url)

json_data = JSON.parse(res.body)

puts json_data['robust_links_html']['original_url_as_href']

This example displays how one would complete this process with Ruby.

On line 11, we encode the anchor text as data for an HTTP GET.

On line 13, we issue an HTTP request with the full URL.

Lines 15 - 17 extract the Robust Link HTML from the JSON response and print out Robust Link as a string.

This prints HTML output where the original resource URL is the value assigned to the href attribute:

<a href="https://abcnews.go.com/"
    data-versionurl="https://web.archive.org/web/20200520203814/https://abcnews.go.com/"
    data-versiondate="2020-05-20">ABC News for May 20, 2020</a>
  

Which is rendered by the browser, as shown below. If a reader clicks on the Robust Links menu to the right, they can choose to visit the live web resource, the Memento, or other Mementos for this resource. If the reader clicks on the anchor text ABC News from May 20, 2020 the browser delivers them to the original resource as it currently exists.

ABC News for May 20, 2020

If the developer wishes to use the Memento URL as the link target, then they can replace line 15 with the following:

puts json_data['robust_links_html']['memento_url_as_href']

This prints HTML output where the Memento URL is the value assigned to the href attribute:

<a href="https://web.archive.org/web/20200520203814/https://abcnews.go.com/"
    data-originalurl="https://abcnews.go.com/"
    data-versiondate="2020-05-20">ABC News for May 20, 2020</a>
  

Which is rendered by the browser, as shown below. If a reader clicks on the Robust Links menu to the right, they can choose to visit the live web resource, the Memento, or other Mementos for this resource. If the reader clicks on the anchor text ABC News from May 20, 2020 the browser delivers them to the Memento for this resource captured by archive.org at 2020-04-14T14:53:54.

ABC News for May 20, 2020

Though the anchor text is the same, and the two links look the same to the reader, they deliver the reader to different destinations. This gives the page author control over which resource the reader reaches by default. In addition, the Robust Links menu to the right provides them with additional options if they wish to visit another version than what is specified in the default.

var url = "https://abcnews.go.com";

var anchor_text = "ABC News from May 20, 2020";

var api_url = "https://robustlinks.mementoweb.org/api/?" + "anchor_text=" + encodeURIComponent(anchor_text) + "&url=" + encodeURIComponent(url);

var client = new XMLHttpRequest();

client.open("GET", api_url, true);

client.onreadystatechange = function() {

  if (this.readyState === XMLHttpRequest.DONE && this.status === 200) {
    var obj = JSON.parse( client.responseText );

    console.log( obj["robust_links_html"]["original_url_as_href"] );
  }
}

client.send();

This example displays how one would complete this process with JavaScript.

Because JavaScript is event driven and this request is asynchronous, lines 11 - 18 contain the callback function that will extract and print the Robust Link once we have a response. Lines 14 - 16 from this event handler extract the Robust Link HTML from the JSON and print out the response as a string.

On line 20, we issue the HTTP GET request with this data. The event handler will execute once the response is received.

This prints HTML output where the original resource URL is the value assigned to the href attribute:

<a href="https://abcnews.go.com/"
    data-versionurl="https://web.archive.org/web/20200520203814/https://abcnews.go.com/"
    data-versiondate="2020-05-20">ABC News for May 20, 2020</a>
  

Which is rendered by the browser, as shown below. If a reader clicks on the Robust Links menu to the right, they can choose to visit the live web resource, the Memento, or other Mementos for this resource. If the reader clicks on the anchor text ABC News from May 20, 2020 the browser delivers them to the original resource as it currently exists.

ABC News for May 20, 2020

If the developer wishes to use the Memento URL as the link target, then they can replace line 18 with the following:

console.log( obj["robust_links_html"]["memento_url_as_href"] );

This prints HTML output where the Memento URL is the value assigned to the href attribute:

<a href="https://web.archive.org/web/20200520203814/https://abcnews.go.com/"
    data-originalurl="https://abcnews.go.com/"
    data-versiondate="2020-05-20">ABC News for May 20, 2020</a>
  

Which is rendered by the browser, as shown below. If a reader clicks on the Robust Links menu to the right, they can choose to visit the live web resource, the Memento, or other Mementos for this resource. If the reader clicks on the anchor text ABC News from May 20, 2020 the browser delivers them to the Memento for this resource captured by archive.org at 2020-04-14T14:53:54.

ABC News for May 20, 2020

Though the anchor text is the same, and the two links look the same to the reader, they deliver the reader to different destinations. This gives the page author control over which resource the reader reaches by default. In addition, the Robust Links menu to the right provides them with additional options if they wish to visit another version than what is specified in the default.


3. Complete API Documentation

API Inputs

The RobustLinks API accepts two inputs.

  1. The URL of a resource is required. The API issues an HTTP GET on the URL to acquire the resource and determines if it is a Memento (archived web page) or an original resource (not a capture, a live web resource). The API makes this determination based on the presence of a Memento-Datetime header in the HTTP response. If the URL submitted belongs to an archive that does not support the Memento Protocol, then the API cannot make this determination and treats it as an original resource. A machine client submits the URL as a value for the url query string parameter.
  2. Optionally, a client can customize the Robust Link with the desired anchor text. By default, links contain the anchor text of Robust Link for [URL] where [URL] is the URL of the original resource or memento, depending on the default link target. A machine client can submit this text as a value for the anchor_text query string parameter.

API Outputs

Successful Generation of Robust Links

If the generation of a Robust Link via HTTP GET is successful, then the API responds with an HTTP status code value of 200 and a JSON data structure, as shown in the cURL examples above. In the examples above, we focused primarily on the memento_url_as_href and original_url_as_href keys, but machine clients can also acquire other information relevant to creating robust links for this resource. Here we show the example again and detail the meaning of each field.

{
  "anchor_text": "ABC News for May 5, 2020",
  "api_version": "0.8",
  "data-originalurl": "https://abcnews.go.com",
  "data-versiondate": "2020-05-11",
  "data-versionurl": "https://web.archive.org/web/20200511183111/https://abcnews.go.com/",
  "memento-datetime": "Mon, 11 May 2020 18:31:11 GMT",
  "request_url": "https://abcnews.go.com",
  "request_url_resource_type": "original-resource",
  "robust_links_html": {
    "memento_url_as_href": "<​a href=\"https://web.archive.org/web/20200511183111/https://abcnews.go.com/\" data-originalurl=\"https://abcnews.go.com/\" data-versiondate=\"2020-05-11\">ABC News for May 11, 2020<​/a>",
    "original_url_as_href": "<​a href=\"https://abcnews.go.com/\" data-versionurl=\"https://web.archive.org/web/20200511183111/https://abcnews.go.com/\" data-versiondate=\"2020-05-11\">ABC News for May 11, 2020<​/a>"
  }
}

The response provides the following JSON keys:

  • anchor_text - The anchor text submitted to the service, or null if none submitted; this key allows the client to verify that the anchor text was correctly interpreted
  • api_version - The version of the API, not the software running it
  • data-originalurl - The original resource URL. This does not necessarily match the submitted URL. If a Memento URL is submitted, then this contains the URL of the original resource that was captured, as identified by the Memento Protocol. This is the value used for the attribute of the same name in a Robust Link.
  • data-versiondate - The date of the Memento's capture in YYYY-mm-dd format as used by Robust Links. This is the value used for the attribute of the same name in a Robust Link.
  • data-versionurl - The Memento URL. This does not necessarily match the submitted URL. If an original resource URL is submitted, then this contains the URL of the Memento that the Robust Links service created. This is the value used for the attribute of the same name in a Robust Link.
  • memento-datetime - The Memento-Datetime of the Memento — when it was captured by the web archive, in the format used by HTTP, specified in RFC 7231 and RFC 5322.
  • request_url - The URL submitted to the API; this key allows the client to verify that the URL was correctly received
  • request_url_resource_type - The resource type of the submitted URL, either memento or original-resource; if the value is original-resource, then the API is indicating to the client that it created a new Memento for the client
  • robust_links_html - A key containing the HTML of the Robust Links, each in a different subkey:
    • memento_url_as_href - the HTML of the Robust Link with the Memento URL as the default link target
    • original_url_as_href - the HTML of the Robust Link with the origninal resource URL as the default link target

Error states

The Robust Links API has the following response codes if an error occurs:

In addition to a non-200 status code, the Robust Links API also returns JSON containing information about the error. The JSON below displays an example. Because we cannot predict everything that could go wrong, we do not list all possible errors here.

{
  "input URL": "abc",
  "error": "There was an issue while processing the submitted URL",
  "error string": "MissingSchema(\"Invalid URL 'abc': No schema supplied. Perhaps you meant http://abc?\")",
  "error data": "Traceback (most recent call last):\n  File \"/Volumes/nerfherder External/Unsynced-Projects/robustlinks_api/robustlinks/api/errors.py\", line 16, in handle_errors\n    response, status_code = function_name(input_url, preferences)\n  File \"/Volumes/nerfherder External/Unsynced-Projects/robustlinks_api/robustlinks/utils.py\", line 28, in is_a_memento_response\n    url_response = requests.get(input_url, headers={ 'user-agent': 'LANL/2020.04.01' })\n  File \"/Users/smj/.virtualenvs/robustlinks_api/lib/python3.7/site-packages/requests/api.py\", line 76, in get\n    return request('get', url, params=params, **kwargs)\n  File \"/Users/smj/.virtualenvs/robustlinks_api/lib/python3.7/site-packages/requests/api.py\", line 61, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/Users/smj/.virtualenvs/robustlinks_api/lib/python3.7/site-packages/requests/sessions.py\", line 516, in request\n    prep = self.prepare_request(req)\n  File \"/Users/smj/.virtualenvs/robustlinks_api/lib/python3.7/site-packages/requests/sessions.py\", line 459, in prepare_request\n    hooks=merge_hooks(request.hooks, self.hooks),\n  File \"/Users/smj/.virtualenvs/robustlinks_api/lib/python3.7/site-packages/requests/models.py\", line 314, in prepare\n    self.prepare_url(url, params)\n  File \"/Users/smj/.virtualenvs/robustlinks_api/lib/python3.7/site-packages/requests/models.py\", line 388, in prepare_url\n    raise MissingSchema(error)\nrequests.exceptions.MissingSchema: Invalid URL 'abc': No schema supplied. Perhaps you meant http://abc?\n"
}

Every error response provides the following JSON keys:

If you need to report issues on the Robust Links API, include this JSON in your report.


4. Discouraged Use Cases

We intend for machine clients to employ the Robust Links API to create Robust Links and return the corresponding HTML. Because of the functionality provided by this API, users may be tempted to employ it for purposes we did not intend. In this section, we outline some of the potential use cases we want to discourage when using the API.

Creating New Mementos

If a machine client submits an original resource URL, the Robust Links API creates a Memento. This is a convenience feature provided to help users quickly automate the replacement of existing links. If a machine client wishes to to create new Mementos from original resources explicitly, then tools like ArchiveNow perform this action with more features and without all of the additional overhead.

Acquring Memento Information

The Robust Links API provides the memento-datetime and original resource URL (in data-originalurl) associated with the Memento URL. It also identifies a resource as a Memento or original resource. If a machine client client needs this information explicitly, then the Robust Links API provides unnecessary overhead. Instead, a machine client should directly query the Memento by employing the Memento Protocol. The py-memento-client Python library can help developers do this with Python applications.