ResourceSync Framework Specification
DO NOT USE, SEE
CURRENT ResourceSync SPECIFICATIONS
Open Archives Initiative
ResourceSync Framework Specification
ResourceSync Framework Specification (ANSI/NISO Z39.99-2014)
21 April 2014
This version:
Latest version:
Previous version:
Abstract
This ResourceSync specification describes a synchronization framework for the web
consisting of various capabilities that allow third-party systems to remain synchronized
with a server's evolving resources. The capabilities may be combined in a modular manner
to meet local or community requirements. This specification also describes how a server
should advertise the synchronization capabilities it supports and how third-party systems
may discover this information. The specification repurposes the document formats defined
by the Sitemap protocol and introduces extensions for them.
Status of this Document
This document is an HTML version of
ANSI/NISO Z39.99-2014
an American National Standard developed by the
National Information Standards Organization.
Approved by the
American National Standards Institute (ANSI)
April 21, 2014. The
front matter, authorship and acknowledgements
of the PDF version are provided in Appendix C.
This specification is one of several documents comprising the
ResourceSync Framework Specifications
Table of Contents
1.
Introduction
1.1
Purpose and Scope
1.2
Motivating Examples
1.3
Walkthrough
2.
Normative References
3.
Definitions
4.
Namespace Prefix Bindings
5.
Synchronization Processes
5.1
Source Perspective
5.2
Destination Perspective
5.3
Summary
6.
Framework Organization
6.1
Structure
6.2
6.3
Discovery
6.3.1
Overview
6.3.2
ResourceSync Well-Known URI
6.3.3
Links
6.3.4
robots.txt
7.
Sitemap Document Formats
8.
Describing the Source
9.
Advertising Capabilities
10.
Describing Resources
10.1
Resource List
10.2
Resource List Index
11.
Packaging Resources
11.1
Resource Dump
11.1.1
Resource Dump Manifest
12.
Describing Changes
12.1
Change List
12.2
Change List Index
13.
Packaging Changes
13.1
Change Dump
13.1.1
Change Dump Manifest
14.
Linking to Related Resources
14.1
Overview
14.2
Mirrored Content
14.3
Alternate Representations
14.4
Patching Content
14.5
Resources and Metadata about Resources
14.6
Prior Versions of Resources
14.7
Collection Membership
14.8
Republishing Resources
Appendix A. (normative)
Time Attribute Requirements
Appendix B.
Bibliography
Appendix C.
Front Matter, Authorship, Acknowledgements
Appendix D.
Change Log
1.
Introduction
1.1
Purpose and Scope
The web is highly dynamic, with resources continuously being created,
updated, and deleted. As a result, using resources from a remote
server involves the challenge of remaining in step with its changing
content. In many cases, there is no need to reflect a
server's evolving content perfectly and, therefore, well-established
resource discovery techniques, such as crawling, suffice as an
updating mechanism. However, there are significant use cases that
require low latency and high accuracy in reflecting a remote server's
changing content. These requirements have typically been addressed by
ad-hoc technical approaches implemented within a small group of
collaborating systems. There have been no widely adopted, web-based
approaches.
This
ResourceSync
specification introduces a range
of easy to implement capabilities that a server may support in order
to enable remote systems to remain more tightly in step with its
evolving resources. It also describes how a server should advertise the
capabilities it supports. Remote systems may inspect this information
to determine how best to remain aligned with the evolving data.
Each capability provides a different synchronization functionality,
such as a list of the server's resources or its recently changed
resources, including what the nature of the change was: create,
update, or delete. All capabilities are implemented on the basis of
the document formats introduced by the
Sitemap protocol
. Capabilities may be
combined to achieve varying levels of functionality and hence meet
different local or community requirements. This modularity provides
flexibility and makes ResourceSync suitable for a broad range of use
cases.
1.2
Motivating Examples
Many projects and services have synchronization needs and have
implemented ad hoc solutions. ResourceSync provides a standard
synchronization method that will reduce implementation effort and
facilitate easier reuse of resources. This section describes motivating
examples with differing needs and complexities.
Consider first the case of a website for a small museum collection.
The website may contain just a few dozen static webpages. The
maintainer may create a
Resource List
of these
webpages and expose it to services that leverage ResourceSync.
When building services over Linked Data, it is often desirable to
maintain a local copy of data for improved access and availability.
Harvesting may be enabled by publishing a
Resource List
for the dataset. In many cases, resource representations exposed as Linked Data are small and
so retrieving them via individual HTTP GET requests is slow because of the
large number of round trips for a small amount of content.
Publishing a
Resource Dump
that points to content packaged and described in ZIP files makes
this more efficient for the client and less burdensome for the server.
Continued synchronization is enabled by recurrently publishing an up-to-date
Resource List
or
Resource Dump
, or, more efficiently, by publishing a
Change List
that provides information about resource changes only.
The
arXiv.org
collection of scientific
articles propagates resource changes to a set of mirror sites and
interacting services on a daily basis. As of July 2013, the collection
contains about 2.6 million resources and there are about 1,600 changes
(creates, updates) per day. The mirroring system operated since 1994
uses HTTP with custom change descriptions and occasionally
rsync
to verify the
copies and to cope with any errors in the incremental updates. The
approach assumes a tight connection between arXiv.org and its mirrors.
It would be desirable to have a solution that allows any
third-party system to accurately synchronize with arXiv.org using
commodity software. arXiv.org could publish both metadata records and
full-text content as separate web resources with their own URI.
Use of ResourceSync capabilities including
Resource Lists
Resource Dumps
Change Lists
, and
Change Dumps
, allows both mirrors and new parties
to remain accurately in sync with the collection. This would extend
the openly available metadata sharing capabilities provided by arXiv.org,
currently implemented via
OAI-PMH
, to full-text
sharing in a web-friendly fashion.
1.3
Walkthrough
Let's assume a Source, http://example.com/, that exposes changing content that others
would like to remain synchronized with. A first step towards making this easy for
Destinations is for the Source to publish a
Resource List
that conveys the URIs of resources available for synchronization. This Resource List is
expressed as a Sitemap. As shown in
Example 1
, the Source conveys
the URI of each resource as the value of the
child element of a
element. Note the
child element of the
root element, which expresses that the Sitemap implements ResourceSync's Resource List
capability. It also conveys that the Resource List reflects the state of the Source's
resources at the datetime provided in the
at
attribute. This datetime
allows a Destination to quickly determine whether it has previously processed this
specific Resource List.
Example 1: A Resource List
The Source can provide additional information in the Resource List to
help the Destination optimize the process of collecting
content and verifying its accuracy. For example,
when the Source expresses the datetime of the most recent modification
for a resource, a Destination can determine whether or not it already
holds the current version, minimizing the number of HTTP requests it
needs to issue in order to remain up-to-date.
Example 2
shows this information
conveyed using Sitemap's
element.
When the Source also conveys a hash for a specific bitstream, a Destination
can verify whether the process of obtaining it was successful.
The example shows this information conveyed using the
hash
attribute on the
element.
In addition, the Source can provide links to related resources using the
element. The example shows a link to a mirror
copy of the second listed resource, indicating that the Source would prefer
a Destination to obtain the resource from it.
Example 2: A Resource List with additional information
In order to describe its changing content in a more timely manner, the
Source can increase the frequency at which it publishes an up-to-date
Resource List. However, changes may be so frequent or the size of the content
collection so vast that regularly updating a complete Resource List may
be impractical. In such cases, the Source can implement an additional
capability that communicates information about changes
only. To this end, ResourceSync introduces Change Lists. A Change List
enumerates resource changes, along with the nature of the
change (create, update, or delete) and the time that the change occurred.
A Destination can recurrently obtain a Change List from the Source,
inspect the listed changes to discover those it has already acted upon,
and process the remaining ones. Changes in a Change List are provided
in forward chronological order, making it straightforward for a
Destination to determine which changes it already processed. In addition,
a Change List also contains datetimes that convey the start time and
the end time of the temporal interval covered by the Change List.
These times convey that all resource changes that
occurred during the interval are described in the Change List.
(ResourceSync does not specify for how long change lists must
continue to be available once they have been produced. The
longer that Change Lists are maintained by the Source, the
better the odds are for a Destination to catch up on changes
it missed because it was offline, for example.)
Example 3
shows a Change List.
The value of the
capability
attribute of the
child element of
makes it clear that, this time,
the Sitemap is a Change List and not a Resource List. The
from
and
until
attributes inform about the temporal interval covered
by the Change List. The Change List shown below conveys two
resource changes, one an update and the other a deletion, as can be
seen from the value of the
change
attribute of the
element. The example also shows the use of the
element to convey the time of the changes. Note that these times are used to
order the Change List chronologically.
until="2013-01-03T00:00:00Z"/>
Example 3: A Change List
A Destination can issue HTTP GET requests against each resource URI listed in a Resource List. For
large Resource Lists, issuing all of these requests may be cumbersome. Therefore, ResourceSync introduces a
capability that a Source can use to
make packaged content available. A Resource Dump, implemented as a Sitemap, contains pointers to packaged content.
Each content package referenced in a Resource Dump is a ZIP file that contains the Source's bitstreams along with a Resource Dump Manifest
that describes each. The Resource Dump Manifest itself is also implemented as a Sitemap.
A Destination can retrieve a Resource Dump, obtain content packages by dereferencing the contained pointers, and unpack the retrieved packages.
Since the Resource Dump Manifest also lists the URI the Source associates with each bitstream, a Destination is able to achieve
the same result as obtaining the data by dereferencing the URIs listed in a Resource List.
Example 4
shows a Resource Dump that points at a single content package. Dereferencing the URI of that package leads to a ZIP file
that contains the Resource Dump Manifest shown in
Example 5
. It indicates that the Source's ZIP file contains two bitstreams.
The
path
attribute of the
element conveys
the file path of the bitstream in the ZIP file (the relative file system path where the bitstream
would reside if the ZIP were unpacked), whereas the
element conveys the URI associated with the bitstream at the Source.
An additional capability, the Change Dump, provides a functionality similar to a Resource Dump but pertains to
packaging bitstreams of resources that have changed during a temporal interval,
instead of packaging a snapshot of resource bitstreams at a specific moment in time.
Example 4: A Resource Dump
Example 5: A Resource Dump Manifest detailing the content of a ZIP file
ResourceSync also introduces a Capability List, which is a way for the
Source to describe the capabilities it supports for one set of resources.
Example 6
shows an example of such a description.
It indicates that the Source supports the Resource List, Resource Dump, and
Change List capabilities and it lists their respective URIs. Note the
inclusion of an
child element of
that links by means of a
describedby
relation
to a description of the set of resources covered by the Capability List.
Because these capabilities are conveyed in the same Capability List, they
uniformly apply to this set of resources. For example, if a given resource
appears in the Resource List then it must also appear in a Resource Dump
and changes to the resource must be reported in the Change List.
Example 6: A Capability List enumerating the ResourceSync capabilities a Source supports for a set of its resources
There are three ways by which a Destination can discover whether and how a Source
supports ResourceSync: a Source-wide approach, a resource-specific approach, and
an approach that leverages existing practice for discovering Sitemaps.
The Source-wide approach leverages the well-known URI specification
and consists of the Source making a Source Description, like the one shown in
Example 7
, available at
/.well-known/resourcesync
. The Source Description enumerates the
Capability Lists a Source offers, one Capability List per set of resources. If a
Source only has one set of resources and hence only one Capability List, the
mandatory Source Description contains only one pointer. The resource-specific
discovery approach
consists of a Source providing a link in an HTML document or in an HTTP Link header
that points at a Capability List that covers the resource that provides the link. Note
in
Example 6
, the inclusion of an
child
element of
that links by means of an
up
relation
to the Source Description, allowing for navigation from a Capability List to a Source
Description. Yet another approach follows the established practice for discovering
Sitemaps via a Source's
robots.txt
file. Since a Resource List is a
Sitemap it can be made discoverable by including its URI in the
robots.txt
file as the value of the
directive. A navigational
up
link included in the Resource List allows discovery of a Capability List pertaining
to the set of resources covered by that Resource List, and a further
up
link in the Capability List leads to the Source Description.
Example 7: A Source Description with a pointer to the Capability List for the single set of resources offered by a Source
In some cases, there is a need to split the documents described so far into parts.
For example, the Sitemap protocol currently prescribes a maximum of 50,000 resources
per Sitemap and a Source may have more resources that are subject to synchronization.
The ResourceSync framework follows these community defined limits and hence,
in such cases, publishes multiple Resource Lists as well as a Resource List Index
that points to each of them. The Resource List Index is expressed using
Sitemap's
document format.
Example 8
shows a Resource List Index that points at two Resource
Lists.
Example 8: A Resource List Index expressed using the
document format
2.
Normative References
The following documents contain provisions that are required for implementing
this standard. All standards are subject to revision; the most current version
of these standards should be used.
ALE
Snell, J.
Atom Link Extensions
Internet Draft. Internet Engineering Task Force (IETF), June 8, 2012.
Available at:
IANA MIME
MIME Media Types
[registry website].
Internet Assigned Numbers Authority (IANA).
Available at:
IANA Relation
Link Relations
[registry website].
Internet Assigned Numbers Authority (IANA).
Available at:
Memento
Van de Sompel, H., M. L. Nelson, and R. D. Sanderson.
HTTP framework for time-based access to resource states -- Memento. Internet Draft
Internet Engineering Task Force (IETF), October 1, 2013.
Available at:
RFC 2616
Fielding, R.
et al.
Hypertext Transfer Protocol -- HTTP/1.1
. RFC 2616.
Internet Engineering Task Force (IETF), June 1999.
Available at:
RFC 4287
Nottingham, M., and R. Sayre,
eds.
The Atom Syndication Format
. RFC 4287.
Internet Engineering Task Force (IETF), December 2005.
Available at:
RFC 5988
Nottingham, M.
Web Linking
. RFC 5988.
Internet Engineering Task Force (IETF), October 2010.
Available at:
RFC 6249
Bryan, A.
et al.
Metalink/HTTP: Mirrors and Hashes
. RFC 6249.
Internet Engineering Task Force (IETF), June 2011.
Available at:
RFC 6906
Wilde, E.
The 'profile' Link Relation Type.
RFC 6906.
Internet Engineering Task Force (IETF), March 2013.
Available at:
Sitemaps
Sitemaps XML Format
sitemaps.org, last updated February 27, 2008.
Available at:
W3C Datetime
Wolf, Misha, and Charles Wicksteed.
Date and Time Formats
. W3C Note.
World Wide Web Consortium, August 27, 1998.
Available at:
ZIP
.ZIP File Format Specification
. Application Note. Version 6.3.3.
PKWARE Inc., September 1, 2012.
Available at:
3.
Definitions
The following terms, as used in this standard, have the meanings indicated.
Term
Definition
Source
A server that hosts resources subject to synchronization
Destination
A system that synchronizes itself with the Source's resources
set of resources
A collection of resources that is made available for synchronization by a
Source. A Source may expose one or more such collections and support distinct
ResourceSync capabilities for each. Individual resources may be included in
more than one set of resources
This specification uses the terms
resource
representation
request
response
content negotiation
client
, and
server
as
described in
Architecture of the World Wide Web
4.
Namespace Prefix Bindings
Throughout this document, the following namespace prefix bindings
are used:
Prefix
Namespace URI
Description
Sitemap XML elements defined in the
Sitemap protocol
rs
Namespace for elements introduced in this specification
5.
Synchronization Processes
Section 1.3
provides a concrete walkthrough of some
capabilities that a Source may implement and describes how a
Destination may use those capabilities to remain synchronized with
the Source's changing data. This section provides a high-level
overview of the various ResourceSync capabilities and shows how
these fit into processes at a Destination designed to keep it in
step with changes.
5.1
Source Perspective
From the perspective of a
Source
, the ResourceSync capabilities
that may be supported to enable Destination processes to remain in
sync with its changing data are summarized as follows:
Describing Content
-- In order to describe its data, a
Source may maintain an up-to-date
Resource List
. A
basic Resource List minimally provides the URIs of resources
that the Source makes available for synchronization. However,
additional information may be added to the Resource List to optimize
the Destination's process of obtaining the Source's resources,
including the most recent modification time of resources and fixity
information such as content-based checksum or hash and length.
Figure 1
shows a Source
publishing up-to-date Resource Lists at times t2 and t4. At t4,
too many resources need to be listed to fit in a single
Resource List and hence multiple Resource Lists are published and
grouped in a Resource List Index.
Packaging Content
-- In order to make its data available for
download, a Source may recurrently make an up-to-date
Resource Dump
of its content available. A Resource Dump points at one or more packages,
each of which contains bitstreams associated with resources hosted by the Source.
Each package also contains a
Resource Dump Manifest
that provides
metadata about the bitstreams contained in the package, and must minimally
include their associated URI and their file path in the ZIP file.
Figure 1
shows a Source publishing
up-to-date Resource Dumps at times t1 and t3. At time t3, multiple
Resource Dumps are published and grouped in a Resource Dump Index.
Describing Changes
-- In order to achieve lower
synchronization latency and/or to improve transfer efficiency,
a Source may publish a Change List that provides information about
changes to its resources. It is up to the Source to decide the temporal
interval covered by a Change List, for example, all the changes
that occurred during the previous hour, the current day, or since
the most recent publication of a Resource List. For each resource
change, a Change List must minimally convey the URI of the changed
resource as well as the datetime and nature of the change (create,
update, delete). Since a Change List is organized on the basis of
changes, it may list the same resource multiple times, once per change.
Figure 2
shows three Change Lists.
The first Change List covers resource changes that occurred between
t1 and t3, the second between t3 and t5, and the third between t5 and t7.
Since too many changes occurred between t5 and t7 to fit in a single
Change List, multiple Change Lists are published and grouped in a
Change List Index.
Packaging Changes
-- In order to make content changes available for download,
a Source may publish a
Change Dump
. A Change Dump points at one or more packages,
each of which contains bitstreams that correspond to the state of resources after they changed.
Each package also contains a
Change Dump Manifest
that provides metadata about the
bitstreams provided in the Change Dump.
For each bitstream, the Change Dump Manifest must minimally include the associated URI, the
datetime when the change that resulted in the bitstream occurred, the nature of the
change (create, update, delete) and, where appropriate, the file path of the bitstream
in the ZIP file.
It is up to a Source to decide the temporal interval covered by a
Change Dump, for example, covering all the resource changes that occurred during the
previous hour, the current day, or since the most recent publication
of a Resource Dump. Since a Change Dump is organized on the basis of changes,
the package(s) it points at may contain multiple bitstreams associated with any given
resource, one per change.
Figure 2
shows three Change Dumps. The first Change Dump covers resource changes
that occurred between t2 and t4, the second between t4 and t6, and the third
between t6 and t8. During the time period between t6 and t8, multiple
Change Dumps are published and grouped in a Change Dump Index.
Linking to Related Resources
-- There are several reasons to
provide additional links from a resource subject to synchronization to
related resources, including:
Alternate Content Transfer
-- The default mechanisms by which a Destination obtains
content for a resource are to issue an HTTP GET against its URI found
in a Resource List or Change List, or to unpack packages obtained via a
Resource Dump or Change Dump. However, additional approaches may also be supported.
For example, a Source may prefer, for synchronization purposes, that
content be obtained from a mirror server and hence from a different
URI. Also, a Source may allow obtaining only the changes that a
resource underwent, instead of the entire changed resource.
This may be desirable when the resource size is considerable and/or the
frequency of changes high. Such an Alternate Content Transfer approach
is expressed by means of a link from the resource to another resource that
makes the content available in an alternate way. It is possible that certain
Destinations do not recognize a specific Alternate Content Transfer approach,
in which case ignoring the link and dereferencing the resource's URI remains
the fallback approach.
Resources and Metadata about Resources
-- Cases exist where both resources and
metadata about resources must be synchronized, for example, a collection of scientific
publications and metadata describing each. From the ResourceSync perspective, both the
resource and the metadata about it are regarded as resources with distinct URIs that
are subject to synchronization. Their inter-relationship is expressed by means of links
with appropriate relation types.
Prior Versions of Resources
-- In some cases a Destination requires a copy of
each version of a resource, not just the most recent one.
A Source may support discovery and access to prior resource versions through links.
Three approaches are provided, one based on linking to
resource versions, and two that leverage features of the
Memento
protocol for time-based
access to resource states.
Figure 1: ResourceSync Source perspective of resource description
Figure 2: ResourceSync Source perspective of change description
5.2
Destination Perspective
From the perspective of a
Destination
, three key processes are enabled by the ResourceSync capabilities;
Figure 3
provides an overview:
Baseline Synchronization
-- In order to become synchronized with
a Source, the Destination must make an initial copy of the Source's data.
A Destination may obtain the Resource List that conveys the URIs of
the Source's resources, and subsequently dereference those URIs one by one.
A Destination may also obtain a Resource Dump that conveys the URIs of one
or more content packages each of which contains bitstreams associated
with the Source's resources. A Destination may dereference those URIs
and subsequently unpack the retrieved content packages, guided by the
contained Resource Dump Manifest.
Incremental Synchronization
-- A Destination may remain in
sync with a Source by repeatedly performing a Baseline Synchronization.
To increase efficiency and decrease latency, a
Source may communicate information about changes to its resources via Change Lists.
This allows a Destination to obtain up-to-date content by dereferencing the
URIs of newly created and updated resources listed in the Change List.
It also allows a Destination to remove its copies of deleted resources, if needed.
A Source may also make a Change Dump available that points at one or more packages,
each of which contains bitstreams that correspond to the state of resources after they changed.
In this case the Destination first obtains the Change Dump, then obtains the
package(s) by dereferencing the URI(s) listed in the Change Dump, and
subsequently unpacks those, guided by the contained Change Dump Manifest.
Audit
-- In order to verify whether it is in sync
with the Source, a Destination must be able to check that the content
it obtained matches the current resources hosted by the Source both regarding coverage and accuracy. This requires an up-to-date
list of resources hosted by the Source, which may be compiled on the basis of a Resource List and Change Lists. It also requires these Lists to contain metadata
per resource that characterizes its most recent state, such
as last modification time, length, and content-based hash.
Figure 3: ResourceSync Destination perspective
5.3
Summary
Table 1
provides a summary of
Section 5
. The table lists Destination processes
as columns and Source capabilities as rows, with cells indicating the
applicability of a capability for a given process.
Source Capabilities
Destination Processes
Baseline Synchronization
Incremental Synchronization
Audit
Describing the Source
Advertising Capabilities
Describing Resources
Resource List
Packaging Resources
Resource Dump
Describing Changes
Change List
Packaging Changes
Change Dump
Linking to Related Resources
Mirrored Content
Alternate Representations
Patching Content
Resources and Metadata about Resources
Prior Versions of Resources
Collection Membership
Republishing Resources
Table 1: Source capabilities versus Destination processes
6.
Framework Organization
6.1
Structure
All capabilities in the ResourceSync framework are implemented on the basis of the
and
Sitemap document formats.
Figure 4: ResourceSync framework structure
depicts the overall structure of the set of documents that is used:
At the top of the picture is the mandatory Source Description. It is a
Destination's typical entry point to learn about a Source's ResourceSync
implementation. The Source Description enumerates all Capability Lists
offered by the Source, one Capability List per set of resources.
If the Source only offers one set of resources, the ResourceSync
Description contains a single pointer.
A Source Description is expressed as a
document and, per Capability List, a
element
is introduced. The
child element of
contains the URI of the Capability List, and the
capabilitylist
value for
the
capability
attribute of the
child element of
makes clear that the URI is that of a Capability List.
A Capability List enumerates all capabilities supported for a set of the Source's resources.
The capabilities defined in this ResourceSync specification are
Resource List, Change List, Resource Dump, and Change Dump. Additional capabilities may be defined
in other specifications.
A Capability List is expressed as a
document and, for each supported capability, a
element is introduced. The
child element of
contains the URI of the document that implements a capability, and the type of capability is expressed
by means of the value of the
capability
attribute of the
child element of
, e.g.,
resourcelist
for a Resource List.
A Resource List and a Change List point at resources. A representation of a resource
may be obtained by dereferencing its URI, listed as the value of the
child element of the
element
for the resource.
A Resource Dump and a Change Dump point at packages, each of which contains bitstreams
associated with resources, as well as a
Manifest that describes the bitstreams provided in the package.
The Manifest contained in a package of bitstreams is expressed as a
document. For each bitstream contained in the package, that document
contains a
element; the
child element of
provides
the URI that corresponds to the bitstream, whereas the
path
attribute of the
child element
of
provides the path of the bitstream in the package.
If a single document suffices to express a Source Description, a Resource List, a Change List etc.,
then the
document format is used.
If multiple documents are required, each is expressed using the
document format, and a
document is introduced as an index to point at all individual
documents.
As a result, the URI of, for example, a Resource List provided in a Capability List may be
either that of a
or a
document.
The
or
documents used for a specific capability
(e.g., Resource List) have the same value for the
capability
attribute (e.g.,
resourcelist
).
The Resource List branch of
Figure 4
is fully compatible with the existing
Sitemap specification, whereas the other branches are extensions introduced to support resource synchronization that
leverage the Sitemap document formats.
Figure 4: ResourceSync framework structure
6.2
The following mechanisms are introduced to support navigating the document
hierarchy described in
Section 6.1
they are illustrated in
Figure 5
and
Figure 6
A link for upward navigation is provided by means of an
child element of the
or
element of a document. This pointer has
up
as the value for the
rel
attribute, and the URI of the document that sits higher
in the hierarchy is provided as the value of the
href
attribute. Following consecutive
up
links eventually leads to the Source Description.
A link for navigation from a document to the index under which it resides, if one exists, is
provided by means of an
child element of the
element of the document. This pointer has
index
as the value for the
rel
attribute, and the URI of the index document
is provided as the value of the
href
attribute.
A link for downward navigation is provided by the content of a
element:
the URI of the document that sits lower in the hierarchy is provided as the value of its
child element, and its type is conveyed as the value of the
capability
attribute of the
child element.
Figure 5: ResourceSync upwards navigation
Figure 6: ResourceSync downwards navigation
6.3
Discovery
6.3.1
Overview
ResourceSync provides three ways for a Destination to discover
whether and how a Source supports ResourceSync: a Source-wide
approach detailed in
Section 6.3.2
, a
resource-specific approach detailed in
Section 6.3.3
, and an approach that
leverages the existing practice of Sitemap discovery via the
robots.txt
file described in
Section 6.3.4
All approaches are summarized in
Figure 7
Figure 7: Discovery of Source Description and Capability List
6.3.2
ResourceSync Well-Known URI
A Source must publish a Source Description, such as the one shown in
Example 7
, and it should be published at the
well-known URI [
RFC 5785
/.well-known/resourcesync
as defined here.
The Source Description document enumerates a Source's Capability Lists
and as such is an appropriate entry
point for Destinations interested in understanding a Source's capabilities.
6.3.3
Links
A Capability List may be made discoverable by means of links provided either
in an HTML document [
HTML Links
XHTML Links
] or in an
HTTP Link header [
RFC 5988
].
In order to include a discovery link in an HTML document,
element is introduced in the
of the document that points to a Capability List.
This
must have a
rel
attribute
with a value of
resourcesync
The Capability List that is made discoverable in this way must pertain to
the resource that provides the link. This means that
the resource must be covered by the capabilities listed in the linked Capability List.
Example 9
shows the structure of a webpage that contains a
link to a Capability List.
As shown in
Example 6
the Source Description can
be discovered from the Capability List by following the
link provided in the
element with the relation type
up
href="http://www.example.com/dataset1/capabilitylist.xml"/>
...
...
Example 9: Discovery by means of an HTML link
A Capability List may also be made discoverable by means of an HTTP
Link header that may be included with a representation of a resource of any
content-type. In order to do so, a link is introduced in the HTTP Link header.
The target of this link is the URI of a Capability List and the value of
its
rel
attribute is
resourcesync
. The Capability
List that is made discoverable in this way must pertain to the resource that
provides the link. This means that the resource must be covered by the
capabilities listed in the linked Capability List.
Example 10
shows an excerpt of an HTTP response header that illustrates this approach.
As shown in
Example 6
the Source Description can be
discovered from the Capability List by following the link provided in the
element with the relation type
up
HTTP/1.1 200 OK
Date: Thu, 21 Jan 2010 00:02:12 GMT
Server: Apache
Link:
rel="resourcesync"
...
Example 10: Discovery by means of an HTTP link
6.3.4
robots.txt
A Resource List is a Sitemap and hence may be made discoverable via the
established approach of adding a
directive to a
Source's
robots.txt
file that has the URI of the Resource
List as its value. If a Source supports multiple sets of resources,
multiple directives may be added, one for each Resource List associated
with a specific set of resources. In case a Source supports both regular
Sitemaps and ResourceSync Sitemaps (Resource Lists) they may be
made discoverable, again, by including multiple
directives as shown in
Example 11
Once a Resource List for a set of resources has been discovered
in this manner, the corresponding Capability List can be discovered
by following a link with the
up
relation type provided
in the Resource List. Next, the Source Description can be discovered
by following yet another link with the
up
relation type provided in the Capability List.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Sitemap: http://example.com/dataset1/resourcelist.xml
Example 11: A
robots.txt
file that points at a Resource List
7.
Sitemap Document Formats
In order to convey information pertaining to resources in the ResourceSync framework,
the Sitemap (root element
) and Sitemap index (root
element
) document formats introduced by the
Sitemap protocol
are used for a variety of purposes.
The
document format is used when it is necessary to
group multiple documents of the
format.
The ResourceSync framework follows community-defined limits for when to
publish multiple documents of the
format.
At time of publication of this specification, the limit is 50,000 items
per document and a document size of 50 MB.
The document formats, as well as their ResourceSync extension elements, are
shown in
Table 2
. The
and
elements are introduced to express metadata
and links, respectively.
Both are in the ResourceSync XML Namespace and may have attributes.
The attributes of these elements defined by ResourceSync are listed in
Table 3
and detailed below. As shown in
the examples, these attributes must not have an XML Namespace prefix.
The
element as well as several of the
ResourceSync attributes are based upon other specifications and in those
cases inherit the semantics defined there; the "Specification" column of
Table 3
refers to those specifications.
Communities may introduce additional attributes when needed but must use
an XML Namespace other than that of ResourceSync and must appropriately use
namespace prefixes for those attributes.
Sitemap Index
...
...
Table 2: The Sitemap document formats including the ResourceSync extensions
The overall structure of the ResourceSync documents is as follows:
or
-- These
elements are the root elements of ResourceSync documents; this specification
adds one mandatory and one optional child element to the child elements of
the Sitemap document formats:
-- In this context, the element conveys information about the document itself. Its use is
mandatory and it may have the following attributes:
at
-- This attribute is used for Resource Lists, Resource List Indexes,
Resource Dumps, Resource Dump Indexes, and Resource Dump Manifests; it is
not used for Change Lists, Change List Indexes, Change Dumps, Change Dump
Indexes, and Change Dump Manifests. Required use of the attribute is detailed in the
sections describing the respective documents and summarized in
Appendix A
The
at
attribute conveys the datetime at which the process of taking
a snapshot of resources for their inclusion in the document to which the
attribute pertains started.
It thus provides a guarantee that each resource state represented in the document is
the result of all changes to the resource at least up until the
datetime expressed as the value of the
at
attribute.
The attribute value is expressed as a
W3C Datetime
and the
use of a complete date and time expressed in UTC using the format
YYYY-MM-DDThh:mm:ss[.s]Z
is recommended.
The attribute represents the time of a snapshot
such as t1, t2, t3, and t4 of
Figure 1
capability
-- This attribute is mandatory in all ResourceSync documents. The value of the attribute conveys the nature of the document,
e.g., whether the document is a Resource List, a Change List, a Manifest, etc.
Defined values are
resourcelist
changelist
resourcedump
changedump
resourcedump-manifest
changedump-manifest
capabilitylist
, and
description
completed
-- This optional attribute is used for Resource Lists,
Resource List Indexes,
Resource Dumps, Resource Dump Indexes, and Resource Dump Manifests; it is
not used for Change Lists, Change List Indexes, Change Dumps, Change Dump
Indexes, and Change Dump Manifests.
The
completed
attribute conveys the datetime at which the process of taking
a snapshot of resources for their inclusion in the document to which the attribute pertains completed.
The combination of the datetimes provided in the
at
and
completed
attributes
expresses an interval during which resources may have changed beyond the state they had at the datetime
expressed in the
at
attribute.
The attribute value for
completed
is expressed as a
W3C Datetime
and the
use of a complete date and time expressed in UTC using the format
YYYY-MM-DDThh:mm:ss[.s]Z
is recommended.
from
-- This attribute is used for Change Lists, Change List Indexes,
Change Dumps, Change Dump Indexes, and Change Dump Manifests; it is not used
for Resource Lists, Resource List Indexes, Resource Dumps, Resource Dump Indexes,
and Resource Dump Manifests. Required use of the attribute is detailed in
the sections describing the respective documents and summarized in
Appendix A
The attribute indicates that all changes that occurred to the set of resources at the Source
since the datetime expressed (and up until the datetime expressed in the
until
attribute, if it exists) are included in the document to which the attribute pertains.
The attribute value is expressed as a
W3C Datetime
and the
use of a complete date and time expressed in UTC using the format
YYYY-MM-DDThh:mm:ss[.s]Z
is recommended.
For example, the first Change List in
Figure 2
would have a
from
value of t1, and the second Change List would have a
from
value of t3.
until
-- This optional attribute is used for Change Lists, Change List Indexes, Change Dumps,
Change Dump Indexes, and Change Dump Manifests; it is not used for Resource Lists,
Resource List Indexes, Resource Dumps, Resource Dump Indexes, and Resource Dump Manifests.
The attribute indicates that all changes that occurred to the set of resource at the Source
up until the datetime expressed are included in the document to which the attribute pertains.
When a document carries the
until
attribute, this indicates that
the document will not be updated anymore.
When a change document does not carry the
until
attribute, any
subsequent changes to the corresponding set of resources will cause the document
to be updated.
The attribute value is expressed as a
W3C Datetime
and the
use of a complete date and time expressed in UTC using the format
YYYY-MM-DDThh:mm:ss[.s]Z
is recommended.
For example, the first Change List in
Figure 2
would have an
until
value of t3 and the second
Change List would have an
until
value of t5.
-- A repeatable element used to support discovery of other
documents by means of a link. Required use of the element is detailed in the sections
that describe the documents in which
is used. It may have
several attributes and the ones defined by ResourceSync are as follows:
href
-- A mandatory attribute to convey the URI of the other document.
rel
-- A mandatory attribute to express a relationship. The following values are explicitly used in this specification:
describedby
-- for linking from a Capability List to a document that describes the set of resources
covered by it, and from a Source Description to a document that describes the Source.
up
-- for linking from a Capability List to the Source Description and from a
document that conveys a capability, such as a Resource List, to the Capability List under which it resides.
index
-- for linking from a document that conveys a capability (e.g., a Resource List)
to a parent index document (e.g., a Resource List Index).
or
-- The
element may have zero or more
child elements, and the
element has zero or more
child elements. Each such child element is used to convey information about
a resource that plays a role in the ResourceSync framework. They may have the
following child elements:
-- A mandatory element that conveys the URI of the resource that plays a role in the ResourceSync framework.
-- An element that conveys the last modification time of the resource with the URI provided in
expressed as a
W3C Datetime
as described earlier in this section.
Its use is mandatory in Change Lists and Change Dump Manifests, and optional in other documents.
-- An optional element that provides a hint about the change frequency
of the resource with the URI provided in
. Defined values are
always
hourly
daily
weekly
monthly
yearly
, and
never
The value
always
should be used for resources that change each time they are accessed.
The value
never
should be used for archived resources.
-- In this context, the element conveys metadata pertaining to the resource with the URI provided in
The element is not repeatable, and is mandatory for some documents and optional
for others, as described in the appropriate sections. It may have several
attributes and the ones defined by ResourceSync are as follows:
at
and
completed
-- The semantics and value
of these attributes are as defined earlier in this section.
They are only used for Resource List Indexes, Resource Dumps, and Resource Dump Indexes.
Required use of the attributes is detailed in the sections describing the respective documents
and summarized in
Appendix A
capability
-- This attribute is mandatory in Source Descriptions and Capability Lists.
Its value indicates the nature of the resource identified by the URI
in the
element, e.g., a Resource List, a Change
List, a Change Dump, etc. Values defined in this specification are
resourcelist
changelist
resourcedump
changedump
, and
capabilitylist
change
-- The value of the attribute conveys the type of change
that a resource underwent. Values defined in this specification are
created
updated
, and
deleted
to convey the creation, update, and deletion of a resource, respectively.
This attribute is used in
Change Lists (Section 12.1)
and
Change Dump Manifests (Section 13.2)
encoding
-- The value of the attribute conveys what content codings have been applied to the resource. The value of the
encoding
attribute should be equal to the value of the
content-encoding
header in the HTTP response as defined in
RFC 2616, Sec. 14.11
from
and
until
-- The semantics and value
of these attributes are as defined earlier in this section.
They are only used for Change List Indexes, Change Dumps, and Change Dump Indexes.
Required use of the attributes is detailed in the sections describing the respective documents
and summarized in
Appendix A
hash
-- The value of the attribute conveys fixity information for a resource representation returned when the URI in
is dereferenced.
The attribute value is expressed in the form of a whitespace-delimited list of hash values.
Each hash value is represented by a hex-encoded digest and is preceded by a token that identifies the utilized hash algorithm, e.g.,
md5:
sha-256:
length
-- The value of the attribute conveys the content length of a resource representation returned when the URI in
is dereferenced.
The value of the
length
attribute should be equal to the value of the
Content-Length
header in the HTTP response
and must be computed as defined in
RFC 2616, Sec. 4.4
path
-- The attribute is only used in
Resource Dump Manifests (Section 11.2)
and
Change Dump Manifests (Section 13.2)
Its value conveys the file path of the bitstream associated with the URI in
in the ZIP file. That is
the relative file system path where the bitstream would reside if the ZIP were unpacked.
type
-- The value of the attribute conveys the Media Type of a resource representation returned when the URI in
is dereferenced.
Registered values are listed in the
IANA MIME Media Type registry
-- In this context, an optional and repeatable element used to link to resources related to the one with the
URI provided in
, such as
a copy on a mirror site, a prior version of the resource, etc. (see
Linking to Related Resources in
Section 5.1
).
It may have several attributes and the ones defined by ResourceSync are as follows:
href
-- A mandatory attribute to convey the URI of the related resource.
rel
-- A mandatory attribute to convey the relationship between the resource with the URI in
and the one with the URI in
href
Values for the
rel
attribute are listed in the document
Relation Types Used in the ResourceSync
Framework
. The ones featured in this specification are:
contents
-- for a link from an entry in a Resource Dump or Change Dump that points to a bitstream package
to a Resource Dump Manifest or a Change Dump Manifest, respectively, for that bitstream package.
duplicate
-- for a link to a resource's mirror location.
alternate
and
canonical
-- for a link to an alternate representation of a resource.
-- for a link to a resource that details the difference between the
previous and current version of a resource.
describedby
and
describes
-- for a link providing additional information about a resource.
memento
and
timegate
-- for a link to access prior versions of a resource.
collection
-- for a link that expresses collection membership.
via
-- for a link that provides provenance information.
encoding
hash
length
modified
path
type
-- Optional
attributes with meanings as described earlier in this section and
pertaining to the related resource.
pri
-- An optional attribute used to express a priority
among links with the same relation type. The attribute value is an integer
between 1 and 999,999, with a lower integer indicating a higher priority
and the absence of the attribute indicating a value of 999,999.
Table 3
lists the elements used in ResourceSync
documents and for each shows the attributes defined by ResourceSync
that may be used with them. The "Specification" column refers to the
specification where elements or attributes were introduced that
ResourceSync equivalents are based upon and inherit their semantics from.
A mark in the "Representation" column for an attribute indicates
that it should only be used when a specific representation of a resource is concerned,
whereas a mark in the "Resource" column indicates it is usable
for a resource in general. A
W3C XML Schema
(available at
is provided to validate the elements introduced by ResourceSync.
Relation types other than the ones listed above may be used in the
ResourceSync Framework. Valid relation types must be registered in the
IANA Link Relation Type Registry
or expressed as URIs as specified in
RFC 5988, Sec. 4.2
. The document
Relation Types Used in the ResourceSync Framework
attempts to
provide an up-to-date overview.
Element/Attribute
Specification
Resource
Representation
or
Sitemap protocol
This specification
at
This specification
capability
This specification
completed
This specification
from
This specification
until
This specification
RFC4287
href
RFC4287
rel
RFC4287
or
Sitemap protocol
Sitemap protocol
Sitemap protocol
Sitemap protocol
This specification
at
This specification
capability
This specification
change
This specification
completed
This specification
encoding
RFC2616
from
This specification
hash
Atom Link Extensions
length
RFC4287
path
This specification
type
RFC4287
until
This specification
This specification
encoding
RFC2616
hash
Atom Link Extensions
href
RFC4287
length
RFC4287
modified
Atom Link Extensions
path
This specification
pri
RFC6249
rel
RFC4287
type
RFC4287
Table 3: Elements and associated attributes defined for the ResourceSync documents
8.
Describing the Source
Source Description
is a mandatory document that enumerates the
Capability Lists offered by a Source. Since a Source has one Capability
List per set of resources that it distinguishes, the Source Description
will enumerate as many Capability Lists as the Source has distinct sets
of resources.
The Source Description is based on the
format.
It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
description
A recommended
child element of
with the relation type
describedby
points to a document that provides information about the Source.
One
child element of
should be included for each Capability List offered by the Source. This
element does not have attributes, but uses child elements to convey
information about the Capability Lists. The
element has the following child elements:
A mandatory
child element provides the URI of the respective Capability List.
An optional
child element must have a
capability
attribute with a value of
capabilitylist
to convey that the URI points to a Capability List.
A recommended
child element with a
describedby
relation type
points to a document that describes the set of resources described by the Capability List.
The
elements should be omitted from the Source Description unless the Source updates the
Source Description every time it updates one of the Capability Lists.
Example 12
shows a Source Description where the Source offers three Capability Lists.
/>
Example 12: A Source Description
If a Source needs to or chooses to publish multiple Source Descriptions, it must group them by means of a Source Description Index.
9.
Advertising Capabilities
Capability List
is a document that enumerates all capabilities supported by a
Source for a specific set of resources. The Source defines which resources are
part of the set of resources described by the Capability List.
If there is more than one such set, then the Source must distinguish them with
different capability lists. The choice of which resources are part of which set
may derive from a variety of criteria, including media type, collection membership,
change frequency, subject of the resource and many others.
A Capability List points at the capability documents for its set of resources:
Resource List (
Section 10.1
),
Resource Dump (
Section 11.1
),
Change List (
Section 12.1
), and
Change Dump (
Section 13.1
).
A Capability List must only contain one entry per capability.
Capabilities that are conveyed in the same Capability List uniformly apply to
the set of resources covered by that Capability List. For example, if a
Capability List enumerates a Resource List, a Resource Dump, and a Change List,
then a given resource that appears in a Resource List must also appear in a
Resource Dump, and changes to the resource must be conveyed in the Change List.
The Capability List is based on the
format.
It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
capabilitylist
A mandatory
child element of
with the relation type
up
points to the Source Description document that enumerates all Capability Lists offered by the Source.
A recommended
child element of
with the relation type
describedby
points to a document that provides information about the set of resources covered by the Capability List.
One
child element of
per capability offered by the Source. This element does not have attributes, but uses
child elements to convey information about the capabilities. The
element has the following child elements:
A mandatory
child element provides the URI of the respective capability document.
A mandatory
child element must have a
capability
attribute to convey the type of the respective capability.
The
elements should be omitted from the Capability List unless the Source updates the Capability List
every time it updates one of the capability documents.
Example 13
shows a Capability List where the Source offers four capabilities: a Resource List, a Resource Dump, a Change List,
and a Change Dump. A Destination cannot determine from the Capability List whether a Source provides, for example, a Resource List Index or a single Resource List.
The capability document must be downloaded to make this determination: a document with a
root element is an index,
a document with a
root element is not.
href="http://example.com/resourcesync_description.xml"/>
/>
Example 13: A Capability List
ResourceSync defines only a small number of capabilities, and enumerating those
does not approach the limits of a single Capability List. Extensions or revisions
of this specification may introduce the use of Capability List Indexes, but Sources
should not generate such structures for the features introduced in this version
of the ResourceSync specification.
10.
Describing Resources
A Source may publish a description of the resources it makes available
for synchronization. This information enables a Destination to make an
initial copy of some or all of those resources, or to update a local
copy to remain synchronized with changes.
10.1
Resource List
Resource List
is introduced to list and describe the resources
that a Source makes available for synchronization.
It presents a snapshot of a Source's resources at a particular point in time.
A Resource List is based on the
document format
introduced by the Sitemap protocol. It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
resourcelist
. It must also have an
at
attribute that conveys the datetime at which the process of taking a snapshot of resources
for their inclusion in the Resource List started, and it may have a
completed
attribute that conveys
the datetime at which that process completed.
A mandatory
child element of
points to the Capability List with the relation type
up
In case a
Resource List Index
exists,
a recommended
child element of
points to it with the relation type
index
One
child element of
should be included for each resource. This element does not have attributes, but uses
child elements to convey information about the resource. The
element has the following child elements:
A mandatory
child element provides the URI of the resource.
Optional
and
child
element with semantics as described in
Section 7
An optional
child element provides further metadata about
the resource. It may have the attributes as described in
Section 7
Optional
child elements link to related resources as described in
Section 7
, and
detailed in
Section 14
Example 14
shows a Resource List with two resources.
The
at
attribute allows a Destination to determine that neither
of the listed resources have undergone a change between their respective
last modification datetimes,
2013-01-02T13:00:00Z
and
2013-01-02T14:00:00Z
, and the datetime that is the value of
the
at
attribute,
2013-01-03T09:00:00Z
href="http://example.com/dataset1/capabilitylist.xml"/>
at="2013-01-03T09:00:00Z"
completed="2013-01-03T09:01:00Z"
/>
2013-01-02T13:00:00Z
length
="8876"
type
="text/html"
/>
type="application/pdf"/>
Example 14: A Resource List
10.2
Resource List Index
The ResourceSync framework adopts the community-defined limits for publishing
documents of the
format and
introduces a
Resource List Index
for grouping multiple Resource Lists.
The union of the Resource Lists referred to in the Resource List Index represents
the entire set of resources that a Source makes available for synchronization.
This set of resources, regardless of whether it is conveyed in a single Resource
List or in multiple Resource Lists via
a Resource List Index, represents the state of the Source's data at a point in time.
A Resource List Index is based on the
document
format introduced by the Sitemap protocol. It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
resourcelist
. It must also have an
at
attribute that conveys the datetime at which the process of taking a snapshot of resources
for their inclusion in the Resource List Index started, and it may have a
completed
attribute that conveys
the datetime at which that process completed.
A mandatory
child element of
points to the Capability List with the relation type
up
One
child element of
should be included for each Resource List. This element does not have attributes,
but uses child elements to convey information about the Resource List. The
element has the following child elements:
A mandatory
child element provides the URI of the Resource List.
An optional
child element with semantics as described in
Section 7
An optional
child element with an
at
attribute and possibly a
completed
attribute to convey the datetime at which the process of taking a snapshot of resources
for their inclusion in the Resource List respectively started and ended.
The Destination can determine whether it has reached a Resource List or
a Resource List Index based on whether the root element is
or
respectively. A Resource List Index that points to three Resource Lists
is shown in
Example 15
at="2013-01-03T09:00:00Z"
completed="2013-01-03T09:10:00Z"/>
Example 15: A Resource List Index
Example 16
shows the content of the Resource List identified by the URI
Structurally, it is identical to the Resource List shown in
Example 14
but it contains an additional
child element of
that provides a navigational link with the relation type
index
to the parent Resource List Index
shown in
Example 15
This link is meant to ease navigation for Destinations and their adoption is therefore recommended.
href="http://example.com/dataset1/resourcelist-index.xml"/>
type="application/pdf"/>
type="image/png"/>
Example 16: A Resource List with a navigational link to its parent Resource List Index
11.
Packaging Resources
In order to provide Destinations with an efficient way to copy a Source's
data using a small number of HTTP requests, a Source may provide packaged
bitstreams for its resources.
11.1
Resource Dump
A Source may publish a
Resource Dump
, which provides links
to packages of the resources' bitstreams. The Resource Dump represents the
Source's state at a point in time. It may be used to transfer resources from
the Source in bulk, rather than the Destination having to make many separate
requests.
The ResourceSync framework specifies the use of the
ZIP file format
as
the packaging format. Communities may define their own packaging format.
A Resource Dump should only point to packages of the same format.
A Resource Dump is based on the
document format
introduced by the Sitemap protocol. It has the
root
element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
resourcedump
. It must also have an
at
attribute that conveys the datetime at which the process of taking a snapshot of resources
for their inclusion in the Resource Dump started, and it may have a
completed
attribute that conveys
the datetime at which that process completed.
A mandatory
child element of
points to the Capability List with the relation type
up
In case a
Resource Dump Index
exists,
a recommended
child element of
points to it with the relation type
index
One
child element of
should be included for each bitstream package. This element does not have attributes,
but uses child elements to convey information about the package. The
element has the following child elements:
A mandatory
child element provides the URI of the package.
An optional
child element with semantics as described in
Section 7
A recommended
child element to convey the Media Type and
the length of the package using the
type
and
length
attribute, respectively. It may also have an
at
attribute and possibly
completed
attribute to convey the datetime at which the process
of taking a snapshot of resources
for their inclusion in the package respectively started and ended.
The child element may further have attributes such as
hash
, as described in
Section 7
An optional
child element with the relation type
contents
that points to the Resource Dump Manifest
associated with the bitstream package.
Example 17
shows a Resource Dump that points to three ZIP files.
Included in each
element is a pointer to the
Resource Dump Manifest
associated with
the package. While this pointer is optional and intended for the Destination's
convenience, if provided, the Source needs to ensure that the referred Manifest
corresponds with the Manifest included in the bitstream package.
at="2013-01-03T09:00:00Z"
completed="2013-01-03T09:04:00Z"/>
length="4765"
at="2013-01-03T09:00:00Z"
completed="2013-01-03T09:02:00Z"
/>
href="http://example.com/resourcedump_manifest-part1.xml"
type="application/xml"/>
at="2013-01-03T09:01:00Z"
completed="2013-01-03T09:03:00Z"/>
type="application/xml"/>
at="2013-01-03T09:03:00Z"
completed="2013-01-03T09:04:00Z"/>
type="application/xml"/>
Example 17: A Resource Dump
If a Source needs to or chooses to publish multiple Resource Dumps, it must
group them using a
Resource Dump Index
in a manner that is similar to what was described in
Section 11.2
11.2
Resource Dump Manifest
Each ZIP package referred to from a Resource Dump must contain a
Resource Dump Manifest
file that describes the package's constituent bitstreams. The file must be named
manifest.xml
and must be located at the top level of the ZIP package.
The Resource Dump Manifest is based on the
format.
It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute with a value of
resourcedump-manifest
. It must also have an
at
attribute that conveys the datetime at which the process of taking a snapshot of resources
for their inclusion in the ZIP package started, and it may have a
completed
attribute that conveys
the datetime at which that process completed.
A mandatory
child element of
points to the Capability List with the relation type
up
One
child element of
per bitstream. This element does not have attributes, but uses
child elements to convey information about the bitstream. The
element has the following child elements:
A mandatory
child element provides the URI that the Source associates with the bitstream.
Optional
and
child elements with semantics as described in
Section 7
A mandatory
child element must have a
path
attribute to convey the location of the bitstream within the package. The value
of the attribute is relative to root of the package and it is expressed with a
leading slash (/). The use of the
type
attribute in the
element is recommended to help Destinations determine
the Media Type of the bitstream. The
element may further
have the attributes
hash
and
length
, as described in
Section 7
Optional
child elements link to related resources
as described in
Section 7
, and detailed in
Section 14
Example 18
shows a Resource Dump Manifest for a ZIP file that contains two bitstreams.
at="2013-01-03T09:00:00Z"
completed="2013-01-03T09:02:00Z"/>
type="text/html"
path="/resources/res1"
/>
type="application/pdf"
path="/resources/res2"
/>
Example 18: A Resource Dump Manifest
12.
Describing Changes
A Source may publish a record of the changes to its resources.
This enables Destinations to efficiently learn about those changes and
hence to synchronize incrementally.
12.1
Change List
Change List
is a document that contains a description of changes to a Source's resources.
It is up to the Source to determine the publication frequency of Change Lists, as well as the temporal interval they cover.
For example, a Source may choose to publish a fixed number of changes per Change List, or all the changes in a period of fixed length,
such as an hour, a day, or a week. All entries in a Change List must be provided in forward chronological order:
the least recently changed resource must be listed at the beginning of the
Change List, while the most recently changed resource must be listed at the end of the document.
If a resource underwent multiple changes in the period covered by a Change List, then it will be listed multiple times, once per change.
A Change List is based on the
document format introduced by the Sitemap protocol.
It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
changelist
. It also has the mandatory
from
and the optional
until
attributes to convey the temporal interval
covered by the Change List. Details about the semantics of these attributes are provided below.
A mandatory
child element of
points to the Capability List with the relation type
up
In case a
Change List Index
exists,
a recommended
child element of
points to it with the relation type
index
One
child element of
should be included for each resource change. This element does not have attributes,
but uses child elements to convey information about the changed resource.
The
element has the following child elements:
A mandatory
child element provides the URI of the changed resource.
A mandatory
and an optional
child element with semantics as described in
Section 7
A mandatory
child element must have the attribute
change
to convey the nature of the change.
Its value is
created
updated
, or
deleted
It may further have attributes
hash
length
, and
type
, as described in
Section 7
Optional
child elements link to related
resources as described in
Section 7
and detailed in
Section 14
The temporal interval covered by a Change List is conveyed by means of the
from
and
until
attributes of the
child element of the
root element.
The
from
attribute indicates that the Change List includes all changes that
occurred to the set of resources at the Source since the datetime expressed as the value
of the attribute. If it exists, the
until
attribute indicates that the Change List
includes all changes that occurred to the set of resources at the Source up until the
datetime expressed as the value of the attribute. Its use is optional for Change Lists:
When a document carries the
until
attribute, this indicates that the document
will not be updated anymore; the Change List is closed.
When a Destination has finished processing a closed Change List, it should consult the enclosing
Change List Index (following the link with the
index
relation type), if applicable,
or the Capability List (following the link with the
up
relation type)
to determine the URI of the Change List that reports the changes that occurred after the datetime expressed as the closed Change List's
until
value.
When a document does not carry the
until
attribute, this
indicates that the document will be updated with further changes; the Change
List remains open. A Destination should continue to poll an open Change List
to learn about further changes. It does not need to consult the enclosing
Change List Index, if applicable, nor the Capability List.
The
from
and
until
attributes help a Destination to
determine whether it has or has not fully processed a Change List. The forward
chronological order of changes in a Change List, the datetime of a resource
change, and the URI of a changed resource help the Destination to determine
the first unprocessed change in a not fully processed Change List.
The Destination should start processing there; it can retrieve a representation
of a changed resource by dereferencing its URI provided in the
child element of the
element that conveys the change.
In order for the determination of the first unprocessed change to be accurate,
the combination of the URI of a changed resource and the datetime of its change
should be unique. Hence, a Source should provide change datetime values at a
sufficiently fine granularity.
Example 19
shows a Change List that indicates that four
resource changes occurred since 2013-01-03T00:00:00Z: one creation, two
updates, and one deletion. One resource underwent two of these changes and
hence is listed twice. The Change List has no
until
attribute,
which indicates that it will report further changes; a Destination should
keep polling this Change List.
from="2013-01-03T00:00:00Z"
/>
2013-01-03T11:00:00Z
/>
2013-01-03T13:00:00Z
/>
2013-01-03T18:00:00Z
/>
2013-01-03T21:00:00Z
/>
Example 19: An open Change List describing four resource changes
12.2
Change List Index
If a Source needs to publish multiple Change Lists, it must group them in a Change List Index.
A Change List Index must enumerate Change Lists in forward chronological order.
A Change List Index is based on the
document format introduced by the Sitemap protocol.
It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
changelist
. It also has the mandatory
from
and the optional
until
attributes to convey the temporal interval
covered by the Change List Index. The semantics of these attributes are as explained in detail for
Change Lists
A mandatory
child element of
points to the Capability List with the relation type
up
One
child element of
should be included for each Change List. This element does not have attributes, but
uses child elements to convey information about the Change List. The
element has the following child elements:
A mandatory
child element provides the URI of the Change List.
An optional
child element with semantics as described in
Section 7
A recommended
child element with the
from
and possibly
until
attributes to convey the temporal interval covered by the Change List. The use of the
until
is as follows:
If the Change List is closed the use of
is required.
If the Change List is open
must not be provided.
The Destination should determine whether it has reached a Change List or a
Change List Index based on whether the root element is
or
respectively.
A Change List Index that points to three Change Lists is shown in
Example 20
. Two of those Change Lists are closed, as
indicated by the provision of
, and one is open,
as indicated by its absence. The closed Change List
is shown in
Example 21
. Note that the value for
for this Change List in the Change List Index is the same as the value of
the
until
attribute in the Change List:
2013-01-02T23:59:59Z
The open Change List could be the one shown in
Example 19
, in which case that list would have an additional link
with an
index
relation type pointing to the Change List Index.
from="2013-01-01T00:00:00Z"
/>
until="2013-01-02T00:00:00Z"
/>
Example 20: A Change List Index
href="http://example.com/dataset1/changelist.xml"/>
from="2013-01-02T00:00:00Z"
until="2013-01-03T00:00:00Z"
/>
Example 21: A closed Change List pointing back to its Index
13.
Packaging Changes
In order to reduce the number of requests required to obtain resource changes,
a Source may provide packaged bitstreams for changed resources.
13.1
Change Dump
To make content changes available for download, a Source may publish
Change Dumps
. A Change Dump is a document that points to packages
containing bitstreams for the Source's changed resources.
The ResourceSync framework specifies the use of the
ZIP file format
as
the packaging format. Communities may define their own packaging format.
A Change Dump should only point to packages of the same format.
It is up to the Source to determine the publication frequency of these packages,
as well as the temporal interval they cover.
For example, a Source may choose to publish a fixed number of changes per package, or all the
changes in a period of fixed length, such as an hour, a day, or a week.
If a resource underwent multiple changes in the period covered by a package,
then the package will contain multiple bitstreams for the resource, one per change.
As new packages are published, new entries are added to the Change Dump that points at them.
All entries in a Change Dump should be provided in forward chronological
order: the least recently published package listed at the beginning of the
Change Dump, the most recent package listed at the end of the document.
A Change Dump is based on the
document format
introduced by the Sitemap protocol. It has the
root element and the following structure:
The mandatory
child element of
must have a
capability
attribute that has a value of
changedump
. It also has the
mandatory
from
and the optional
until
attributes to convey the temporal interval
covered by the Change Dump. The semantics of these attributes are as explained in detail for
Change Lists
A mandatory
child element of
points to the Capability List with the relation type
up
In case a
Change Dump Index
exists,
a recommended
child element of
points to it with the relation type
index
One
child element of
should be included for each bitstream package. This element does not have
attributes, but uses child elements to convey information about the package.
The
element has the following child elements:
A mandatory
child element provides the URI of the package.
An optional
US