RELAX NG Specification
RELAX NG Specification
Committee Specification�3 December 2001
This version:
Committee Specification: 3 December 2001
Previous versions:
Committee Specification: 11 August 2001
Editors:
James Clark�
[email protected]
, MURATA Makoto�
[email protected]
Copyright � The Organization for the Advancement of
Structured Information Standards [OASIS] 2001. All Rights
Reserved.
This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to OASIS, except as needed for the
purpose of developing OASIS specifications, in which case the
procedures for copyrights defined in the OASIS Intellectual Property
Rights document must be followed, or as required to translate it into
languages other than English.
The limited permissions granted above are perpetual and will not
be revoked by OASIS or its successors or assigns.
This document and the information contained herein is provided
on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE
USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Abstract
This is the definitive specification of RELAX NG, a simple
schema language for XML, based on
[RELAX]
and
[TREX]
. A RELAX NG schema specifies a pattern for the
structure and content of an XML document. A RELAX NG schema is itself
an XML document.
Status of this Document
This Committee Specification was approved for publication by the
OASIS RELAX NG technical committee. It is a stable document which
represents the consensus of the committee. Comments on this document
may be sent to
[email protected]
A list of known errors in this document is available at
Table of Contents
Introduction
Data model
2.1
Example
Full syntax
3.1
Example
Simplification
4.1
Annotations
4.2
Whitespace
4.3
datatypeLibrary
attribute
4.4
type
attribute of
value
element
4.5
href
attribute
4.6
externalRef
element
4.7
include
element
4.8
name
attribute of
element
and
attribute
elements
4.9
ns
attribute
4.10
QNames
4.11
div
element
4.12
Number of child elements
4.13
mixed
element
4.14
optional
element
4.15
zeroOrMore
element
4.16
Constraints
4.17
combine
attribute
4.18
grammar
element
4.19
define
and
ref
elements
4.20
notAllowed
element
4.21
empty
element
Simple syntax
5.1
Example
Semantics
6.1
Name classes
6.2
Patterns
6.2.1
choice
pattern
6.2.2
group
pattern
6.2.3
empty
pattern
6.2.4
text
pattern
6.2.5
oneOrMore
pattern
6.2.6
interleave
pattern
6.2.7
element
and
attribute
pattern
6.2.8
data
and
value
pattern
6.2.9
Built-in datatype library
6.2.10
list
pattern
6.3
Validity
6.4
Example
Restrictions
7.1
Contextual restrictions
7.1.1
attribute
pattern
7.1.2
oneOrMore
pattern
7.1.3
list
pattern
7.1.4
except
in
data
pattern
7.1.5
start
element
7.2
String sequences
7.3
Restrictions on attributes
7.4
Restrictions on
interleave
Conformance
Appendixes
RELAX NG schema for RELAX NG
Changes since version 0.9
RELAX NG TC (Non-Normative)
References
1. Introduction
This document specifies
when an XML document is a correct RELAX NG
schema
when an XML document is valid with respect to a
correct RELAX NG schema
An XML document that is being validated with respect to a RELAX NG
schema is referred to as an instance.
The structure of this document is as follows.
Section 2
describes the data model, which is the
abstraction of an XML document used throughout the rest of the
document.
Section 3
describes the syntax of a
RELAX NG schema; any correct RELAX NG schema must conform to this
syntax.
Section 4
describes a sequence of
transformations that are applied to simplify a RELAX NG schema;
applying the transformations also involves checking certain
restrictions that must be satisfied by a correct RELAX NG
schema.
Section 5
describes the syntax that
results from applying the transformations; this simple syntax is a
subset of the full syntax.
Section 6
describes the
semantics of a correct RELAX NG schema that uses the simple syntax;
the semantics specify when an element is valid with respect to a RELAX
NG schema.
Section 7
describes restrictions in
terms of the simple syntax; a correct RELAX NG schema must be such
that, after transformation into the simple form, it satisfies these
restrictions. Finally,
Section 8
describes
conformance requirements for RELAX NG validators.
A tutorial is available separately (see
[Tutorial]
).
2. Data model
RELAX NG deals with XML documents representing both schemas and
instances through an abstract data model. XML documents representing
schemas and instances must be well-formed in conformance with
[XML 1.0]
and must conform to the constraints of
[XML Namespaces]
An XML document is represented by an element. An element consists
of
a name
a context
a set of attributes
an ordered sequence of zero or more children; each
child is either an element or a non-empty string; the sequence never contains
two consecutive strings
A name consists of
a string representing the namespace URI; the empty
string has special significance, representing the absence of any
namespace
a string representing the local name; this string matches the NCName
production of
[XML Namespaces]
A context consists of
a base URI
a namespace map; this maps prefixes to namespace URIs,
and also may specify a default namespace URI (as declared
by the
xmlns
attribute)
An attribute consists of
a name
a string representing the value
A string consists of a sequence of zero or more characters,
where a character is as defined in
[XML 1.0]
The element for an XML document is constructed from an instance
of the
[XML Infoset]
as follows. We use the notation
] to refer to the value of the
property of an information item. An
element is constructed from a document information item by
constructing an element from the [document element]. An element is
constructed from an element information item by constructing the name
from the [namespace name] and [local name], the context from the [base
URI] and [in-scope namespaces], the attributes from the [attributes],
and the children from the [children]. The attributes of an element
are constructed from the unordered set of attribute information items
by constructing an attribute for each attribute information item. The
children of an element are constructed from the list of child
information items first by removing information items other than
element information items and character information items, and then by
constructing an element for each element information item in the list
and a string for each maximal sequence of character information items.
An attribute is constructed from an attribute information item by
constructing the name from the [namespace name] and [local name], and
the value from the [normalized value]. When constructing the name of
an element or attribute from the [namespace name] and [local name], if
the [namespace name] property is not present, then the name is
constructed from an empty string and the [local name]. A string is
constructed from a sequence of character information items by
constructing a character from the [character code] of each character
information item.
It is possible for there to be multiple distinct infosets for a
single XML document. This is because XML parsers are not required to
process all DTD declarations or expand all external parsed general
entities. Amongst these multiple infosets, there is exactly one
infoset for which [all declarations processed] is true and which does
not contain any unexpanded entity reference information items. This
is the infoset that is the basis for defining the RELAX NG data
model.
2.1. Example
Suppose the document
is as
follows:
The element representing this document has
a name which has
the empty string as the namespace URI, representing
the absence of any namespace
foo
as the local
name
a context which has
as the base
URI
a namespace map which
maps the prefix
xml
to the
namespace URI
(the
xml
prefix is implicitly declared
by every XML document)
specifies the empty string as the default namespace
URI
an empty set of attributes
a sequence of children consisting
of an element which has
a name which has
as the
namespace URI
bar1
as the local
name
a context which has
as the base
URI
a namespace map which
maps the prefix
pre1
to the
namespace URI
maps the prefix
xml
to the
namespace URI
specifies the empty string as the default namespace
URI
an empty set of attributes
an empty sequence of children
followed by an element which has
a name which has
as the
namespace URI
bar2
as the local
name
a context which has
as the base
URI
a namespace map which
maps the prefix
pre2
to the
namespace URI
maps the prefix
xml
to the
namespace URI
specifies the empty string as the default namespace
URI
an empty set of attributes
an empty sequence of children
3. Full syntax
The following grammar summarizes the syntax of RELAX NG.
Although we use a notation based on the XML representation of an RELAX
NG schema as a sequence of characters, the grammar must be understood
as operating at the data model level. For example, although the
syntax uses
, an instance or
schema can use
instead,
because they both represent the same element at the data model level.
All elements shown in the grammar are qualified with the namespace
URI:
The symbols QName and NCName are defined in
[XML Namespaces]
. The anyURI symbol has the same meaning as the
anyURI datatype of
[W3C XML Schema Datatypes]
: it indicates a
string that, after escaping of disallowed values as described in
Section 5.4 of
[XLink]
, is a URI reference as defined
in
[RFC 2396]
(as modified by
[RFC 2732]
). The symbol string matches any string.
In addition to the attributes shown explicitly, any element can
have an
ns
attribute and any element can have a
datatypeLibrary
attribute. The
ns
attribute can have any value. The value of the
datatypeLibrary
attribute must match the anyURI
symbol as described in the previous paragraph; in addition, it must
not use the relative form of URI reference and must not have a
fragment identifier; as an exception to this, the value may be the
empty string.
Any element can also have foreign attributes in addition to the
attributes shown in the grammar. A foreign attribute is an attribute
with a name whose namespace URI is neither the empty string nor the
RELAX NG namespace URI. Any element that cannot have string children
(that is, any element other than
value
param
and
name
) may have foreign child elements in addition
to the child elements shown in the grammar. A foreign element is an
element with a name whose namespace URI is not the RELAX NG namespace
URI. There are no constraints on the relative position of foreign
child elements with respect to other child elements.
Any element can also have as children strings that consist
entirely of whitespace characters, where a whitespace character is one
of #x20, #x9, #xD or #xA. There are no constraints on the relative
position of whitespace string children with respect to child
elements.
Leading and trailing whitespace is allowed for value of each
name
type
and
combine
attribute and for the content of each
name
element.
pattern
��::=��
QName
pattern
pattern
QName
pattern
patternpattern
name="
NCName
/>
NCName
/>
NCName
string
type="
NCName
param
* [
exceptPattern
anyURI
/>
param
��::=��
name="
NCName
string
exceptPattern
��::=��
grammarContent
��::=��
start
define
anyURI
includeContent
includeContent
��::=��
start
define
start
��::=��
method
pattern
define
��::=��
NCName
combine="
method
pattern
method
��::=��
choice
interleave
nameClass
��::=��
exceptNameClass
��::=��
3.1. Example
Here is an example of a schema in the full syntax for the
document in
Section 2.1
xmlns:a="http://relaxng.org/ns/annotation/1.0"
xmlns:ex1="http://www.example.com/n1"
xmlns:ex2="http://www.example.com/n2">
4. Simplification
The full syntax given in the previous section is transformed
into a simpler syntax by applying the following transformation rules
in order. The effect must be as if each rule was applied to all
elements in the schema before the next rule is applied. A
transformation rule may also specify constraints that must be
satisfied by a correct schema. The transformation rules are applied
at the data model level. Before the transformations are applied, the
schema is parsed into an instance of the data model.
4.1. Annotations
Foreign attributes and elements are removed.
Note
It is safe to remove
xml:base
attributes at this stage because
xml:base
attributes are used in determining the [base URI] of an element
information item, which is in turn used to construct the base URI of
the context of an element. Thus, after a document has been parsed
into an instance of the data model,
xml:base
attributes can be discarded.
4.2. Whitespace
For each element other than
value
and
param
, each child that is a string containing only
whitespace characters is removed.
Leading and trailing whitespace characters are removed from the
value of each
name
type
and
combine
attribute and from the content of each
name
element.
4.3.
datatypeLibrary
attribute
The value of each
datatypeLibary
attribute is
transformed by escaping disallowed characters as specified in Section
5.4 of
[XLink]
For any
data
or
value
element that does not have a
datatypeLibrary
attribute, a
datatypeLibrary
attribute is
added. The value of the added
datatypeLibrary
attribute is the value of the
datatypeLibrary
attribute of the nearest ancestor element that has a
datatypeLibrary
attribute, or the empty string if
there is no such ancestor. Then, any
datatypeLibrary
attribute that is on an element other than
data
or
value
is removed.
4.4.
type
attribute of
value
element
For any
value
element that does not have a
type
attribute, a
type
attribute
is added with value
token
and the value of the
datatypeLibrary
attribute is changed to the empty
string.
4.5.
href
attribute
The value of the
href
attribute on an
externalRef
or
include
element
is first transformed by escaping disallowed characters as specified in
Section 5.4 of
[XLink]
. The URI reference is then
resolved into an absolute form as described in section 5.2 of
[RFC 2396]
using the base URI from the context of the element
that bears the
href
attribute.
The value of the
href
attribute will be used
to construct an element (as specified in
Section 2
). This must be done as follows. The URI
reference consists of the URI itself and an optional fragment
identifier. The resource identified by the URI is retrieved. The
result is a MIME entity: a sequence of bytes labeled with a MIME
media type. The media type determines how an element is constructed
from the MIME entity and optional fragment identifier. When the media
type is
application/xml
or
text/xml
, the MIME entity must be parsed as an XML
document in accordance with the applicable RFC (at the term of writing
[RFC 3023]
) and an element constructed from the result
of the parse as specified in
Section 2
. In
particular, the
charset
parameter must be handled
as specified by the RFC. This specification does not define the
handling of media types other than
application/xml
and
text/xml
. The
href
attribute
must not include a fragment identifier unless the registration of the
media type of the resource identified by the attribute defines the
interpretation of fragment identifiers for that media type.
Note
[RFC 3023]
does not define the
interpretation of fragment identifiers for
application/xml
or
text/xml
4.6.
externalRef
element
An
externalRef
element is transformed as
follows. An element is constructed using the URI reference that is
the value of
href
attribute as specified in
Section 4.5
. This element must match the syntax for pattern. The
element is transformed by recursively applying the rules from this
subsection and from previous subsections of this section. This must
not result in a loop. In other words, the transformation of the
referenced element must not require the dereferencing of an
externalRef
attribute with an
href
attribute with the same value.
Any
ns
attribute on the
externalRef
element is transferred to the
referenced element if the referenced element does not already have an
ns
attribute. The
externalRef
element is then replaced by the referenced element.
4.7.
include
element
An
include
element is transformed as follows.
An element is constructed using the URI reference that is the value of
href
attribute as specified in
Section 4.5
. This element must be a
grammar
element, matching the syntax for grammar.
This
grammar
element is transformed by
recursively applying the rules from this subsection and from previous
subsections of this section. This must not result in a loop. In other
words, the transformation of the
grammar
element
must not require the dereferencing of an
include
attribute with an
href
attribute with the same
value.
Define the
components
of an element to
be the children of the element together with the components of any
div
child elements. If the
include
element has a
start
component, then the
grammar
element must have a
start
component. If the
include
element has a
start
component, then all
start
components are removed from the
grammar
element. If the
include
element has a
define
component, then the
grammar
element must have a
define
component with the same name. For every
define
component of the
include
element, all
define
components with the same name
are removed from the
grammar
element.
The
include
element is transformed into a
div
element. The attributes of the
div
element are the attributes of the
include
element other than the
href
attribute. The children of the
div
element are the
grammar
element (after the removal of the
start
and
define
components described by the preceding
paragraph) followed by the children of the
include
element. The
grammar
element is then renamed to
div
4.8.
name
attribute of
element
and
attribute
elements
The
name
attribute on an
element
or
attribute
element is
transformed into a
name
child element.
If an
attribute
element has a
name
attribute but no
ns
attribute, then an
ns=""
attribute is added to the
name
child element.
4.9.
ns
attribute
For any
name
nsName
or
value
element that does not have an
ns
attribute, an
ns
attribute is
added. The value of the added
ns
attribute is the
value of the
ns
attribute of the nearest ancestor
element that has an
ns
attribute, or the empty
string if there is no such ancestor. Then, any
ns
attribute that is on an element other than
name
nsName
or
value
is
removed.
Note
The value of the
ns
attribute is
not
transformed either by escaping
disallowed characters, or in any other way, because the value of the
ns
attribute is compared against namespace URIs in
the instance, which are not subject to any
transformation.
Note
Since
include
and
externalRef
elements are resolved after
datatypeLibrary
attributes are added but before
ns
attributes are added,
ns
attributes are inherited into external schemas but
datatypeLibrary
attributes are not.
4.10. QNames
For any
name
element containing a prefix, the
prefix is removed and an
ns
attribute is added
replacing any existing
ns
attribute. The value of
the added
ns
attribute is the value to which the
namespace map of the context of the
name
element
maps the prefix. The context must have a mapping for the
prefix.
4.11.
div
element
Each
div
element is replaced by its
children.
4.12. Number of child elements
define
oneOrMore
zeroOrMore
optional
list
or
mixed
element is transformed so that it has exactly
one child element. If it has more than one child element, then its
child elements are wrapped in a
group
element. Similarly, an
element
element is transformed so
that it has exactly two child elements, the first being a name class
and the second being a pattern. If it has more than two child elements,
then the child elements other than the first are wrapped in a
group
element.
except
element is transformed
so that it has exactly one child element. If it has more
than one child element, then its child elements are wrapped
in a
choice
element.
If an
attribute
element has only one child
element (a name class), then a
text
element is
added.
choice
group
or
interleave
element is transformed so that it has
exactly two child elements. If it has one child element, then it is
replaced by its child element. If it has more than two child
elements, then the first two child elements are combined into a new
element with the same name as the parent element and with the first
two child elements as its children. For example,
p1
p2
p3
is transformed to
p1
p2
p3
This reduces the number of child elements by one. The
transformation is applied repeatedly until there are exactly two child
elements.
4.13.
mixed
element
mixed
element is transformed into an
interleaving with a
text
element:
is transformed into
4.14.
optional
element
An
optional
element is transformed into
a choice with
empty
is transformed into
4.15.
zeroOrMore
element
zeroOrMore
element is transformed into a choice
between
oneOrMore
and
empty
is transformed into
4.16. Constraints
In this rule, no transformation is performed, but various
constraints are checked.
Note
The constraints in this section, unlike the constraints
specified in
Section 7
, can be checked without
resolving any
ref
elements, and are accordingly
applied even to patterns that will disappear during later stages of
simplification because they are not reachable (see
Section 4.19
) or because of
notAllowed
(see
Section 4.20
).
An
except
element that is a child of an
anyName
element must not have any
anyName
descendant elements. An
except
element that is a child of an
nsName
element must not have any
nsName
or
anyName
descendant
elements.
name
element that occurs as the first child
of an
attribute
element or as the descendant of the
first child of an
attribute
element and that has an
ns
attribute with value equal to the empty string
must not have content equal to
xmlns
name
or
nsName
element
that occurs as the first child of an
attribute
element or as the descendant of the first child of an
attribute
element must not have an
ns
attribute with value
Note
The
[XML Infoset]
defines the namespace URI of
namespace declaration attributes to be
data
or
value
element
must be correct in its use of datatypes. Specifically, the
type
attribute must identify a datatype within the
datatype library identified by the value of the
datatypeLibrary
attribute. For a
data
element, the parameter list must be one that
is allowed by the datatype (see
Section 6.2.8
).
4.17.
combine
attribute
For each
grammar
element, all
define
elements with the same name are combined
together. For any name, there must not be more than one
define
element with that name that does not have a
combine
attribute. For any name, if there is a
define
element with that name that has a
combine
attribute with the value
choice
, then there must not also be a
define
element with that name that has a
combine
attribute with the value
interleave
. Thus, for any name, if there is more
than one
define
element with that name, then there
is a unique value for the
combine
attribute for
that name. After determining this unique value, the
combine
attributes are removed. A pair of
definitions
p1
p2
is combined into
p1
p2
where
is the value of the
combine
attribute. Pairs of definitions are
combined until there is exactly one
define
element
for each name.
Similarly, for each
grammar
element all
start
elements are combined together. There must
not be more than one
start
element that does not
have a
combine
attribute. If there is a
start
element that has a
combine
attribute with the value
choice
, there must not
also be a
start
element that has a
combine
attribute with the value
interleave
4.18.
grammar
element
In this rule, the schema is transformed so that its top-level
element is
grammar
and so that it has no other
grammar
elements.
Define the
in-scope grammar
for an
element to be the nearest ancestor
grammar
element. A
ref
element
refers to
define
element if the value of their
name
attributes is the same and their in-scope
grammars are the same. A
parentRef
element
refers to
define
element
if the value of their
name
attributes is the same
and the in-scope grammar of the in-scope grammar of the
parentRef
element is the same as the in-scope
grammar of the
define
element. Every
ref
or
parentRef
element must
refer to a
define
element. A
grammar
must have a
start
child
element.
First, transform the top-level pattern
into
Next, rename
define
elements so that no two
define
elements anywhere in the schema have the
same name. To rename a
define
element, change the
value of its
name
attribute and change the value of
the
name
attribute of all
ref
and
parentRef
elements that refer to that
define
element. Next, move all
define
elements to be children of the top-level
grammar
element, replace each nested
grammar
element by the child of its
start
element and rename each
parentRef
element to
ref
4.19.
define
and
ref
elements
In this rule, the grammar is transformed so that every
element
element is the child of a
define
element, and the child of every
define
element is an
element
element.
First, remove any
define
element that is not
reachable
. A
define
element
is reachable if there is reachable
ref
element
referring to it. A
ref
element is reachable if it
is the descendant of the
start
element or of a
reachable
define
element. Now, for
each
element
element that is not the child of a
define
element, add a
define
element to the
grammar
element, and replace the
element
element by a
ref
element
referring to the added
define
element. The value of
the
name
attribute of the added
define
element must be different from value of the
name
attribute of all other
define
elements. The child of the added
define
element is the
element
element.
Define a
ref
element to be
expandable
if it refers to a
define
element whose child is not an
element
element. For each
ref
element that is expandable and is a descendant of a
start
element or an
element
element, expand it by replacing the
ref
element by
the child of the
define
element to which it refers and
then recursively expanding any expandable
ref
elements in this replacement. This must not result in a loop.
In other words expanding the replacement of a
ref
element having a
name
with
value
must not require the expansion of
ref
element also having a
name
with value
. Finally, remove any
define
element whose child is not an
element
element.
4.20.
notAllowed
element
In this rule, the grammar is transformed so that a
notAllowed
element occurs only as the child of
start
or
element
element. An
attribute
list
group
interleave
or
oneOrMore
element that has a
notAllowed
child element is transformed into a
notAllowed
element. A
choice
element that has two
notAllowed
child elements is
transformed into a
notAllowed
element. A
choice
element that has one
notAllowed
child element is transformed into its
other child element. An
except
element that has a
notAllowed
child element is removed.
The preceding transformations are applied
repeatedly until none of them is applicable any more.
Any
define
element that is no longer reachable
is removed.
4.21.
empty
element
In this rule, the grammar is transformed so that an
empty
element does not occur as a child of a
group
interleave
, or
oneOrMore
element or as the second child of
choice
element. A
group
interleave
or
choice
element
that has two
empty
child elements is transformed
into an
empty
element. A
group
or
interleave
element that has one
empty
child element is transformed into its other
child element. A
choice
element whose
second child element is an
empty
element is
transformed by interchanging its two child elements. A
oneOrMore
element that has an
empty
child element is transformed into an
empty
element. The preceding transformations are applied
repeatedly until none of them is applicable any more.
5. Simple syntax
After applying all the rules in
Section 4
, the schema will match the following
grammar:
grammar
��::=��
define
define
��::=��
NCName
top
top
��::=��
pattern
pattern
��::=��
nonEmptyPattern
nonEmptyPattern
��::=��
type="
NCName
datatypeLibrary="
anyURI
param
* [
exceptPattern
anyURI
type="
NCName
ns="
string
stringpattern
pattern
name="
NCName
/>
nonEmptyPattern
nonEmptyPattern
nonEmptyPattern
param
��::=��
name="
NCName
string
exceptPattern
��::=��
nameClass
��::=��
string
exceptNameClass
string
NCName
nameClass
exceptNameClass
��::=��
With this grammar, no elements or attributes are allowed other
than those explicitly shown.
5.1. Example
The following is an example of how the schema in
Section 3.1
can be transformed into the simple
syntax:
Note
Strictly speaking, the result of simplification is an
instance of the data model rather than an XML document. For
convenience, we use an XML document to represent an instance of the
data model.
6. Semantics
In this section, we define the semantics of a correct RELAX NG
schema that has been transformed into the simple syntax. The
semantics of a RELAX NG schema consist of a specification of what XML
documents are valid with respect to that schema. The semantics are
described formally. The formalism uses axioms and inference rules.
Axioms are propositions that are provable unconditionally. An
inference rule consists of one or more antecedents and exactly one
consequent. An antecedent is either positive or negative. If all the
positive antecedents of an inference rule are provable and none of the
negative antecedents are provable, then the consequent of the
inference rule is provable. An XML document is valid with respect to a
RELAX NG schema if and only if the proposition that it is valid is
provable in the formalism specified in this section.
Note
This kind of formalism is similar to a proof system.
However, a traditional proof system only has positive
antecedents.
The notation for inference rules separates the antecedents from
the consequent by a horizontal line: the antecedents are above the
line; the consequent is below the line. If an antecedent is of the
form not(
), then it is a negative
antecedent; otherwise, it is a positive antecedent. Both axioms and
inferences
rules may use variables. A variable has a name and optionally a
subscript. The name of a variable is italicized. Each variable has a
range that is determined by its name. Axioms and inference rules are
implicitly universally quantified over the variables they contain. We
explain this further below.
The possibility that an inference rule or axiom may contain more
than one occurrence of a particular variable requires that an identity
relation be defined on each kind of object over which a variable can
range. The identity relation for all kinds of object is value-based.
Two objects of a particular kind are identical if the constituents of
the objects are identical. For example, two attributes are considered
the same if they have the same name and the same value. Two characters
are identical if their Unicode character codes are the same.
6.1. Name classes
The main semantic concept for name classes is that of a name
belonging to a name class. A name class is an element that matches the
production nameClass. A name is as defined in
Section 2
: it consists of a namespace URI and a local
name.
We use the following notation:
is a variable
that ranges over names
nc
ranges over name classes
in
nc
asserts that name
is a member of name class
nc
We are now ready for our first axiom, which is called "anyName
1":
(anyName 1)
in
This says for any name
belongs to the name class
in other words
matches any name. Note the
effect of the implicit universal quantification over the variables in
the axiom: this is what makes the axiom apply for any name
Our first inference rule is almost as simple:
(anyName 2)
not(
in
nc
in
This says that for any name
and for any name class
nc
if
does not belong to
nc
then
belongs to
. In other words,
matches any name that does not match
nc
We now need the following additional notation:
ln
ranges over local names; a local name is a string that
matches the NCName production of
[XML Namespaces]
, that is,
a name with no colons
ranges over URIs
name(
ln
constructs a name with URI
and local
name
ln
The remaining axioms and inference rules for name classes are as
follows:
(nsName 1)
name(
ln
) in
/>
(nsName 2)
not(name(
ln
) in
nc
name(
ln
) in
(name)
name(
ln
) in
ln
(name choice 1)
in
nc
in
nc
(name choice 2)
in
nc
in
nc
6.2. Patterns
The axioms and inference rules for patterns use the following
notation:
cx
ranges
over contexts (as defined in
Section 2
ranges over
sets of attributes; a set with a single member
is considered the same as that member
ranges over sequences of
elements and strings; a sequence with a single member is considered
the same as that member; the sequences ranged over by
may contain consecutive strings and may contain strings
that are empty; thus, there are sequences ranged over by
that cannot occur as the children of an
element
ranges
over patterns (elements matching the pattern
production)
cx
|-
=~
asserts that with respect to context
cx
, the
attributes
and the sequence of elements and
strings
matches the pattern
6.2.1.
choice
pattern
The semantics of the
choice
pattern are as follows:
(choice 1)
cx
|-
=~
cx
|-
=~
(choice 2)
cx
|-
=~
cx
|-
=~
6.2.2.
group
pattern
We use the following additional notation:
represents the concatenation of the sequences
and
represents the union of
and
The semantics of the
group
pattern are as follows:
(group)
cx
|-
=~
����
cx
|-
=~
cx
|-
=~
Note
The restriction in
Section 7.3
ensures that the set of attributes constructed in the consequent will
not have multiple attributes with the same name.
6.2.3.
empty
pattern
We use the following additional notation:
( )
represents an empty sequence
{ }
represents an empty set
The semantics of the
empty
pattern are as follows:
(empty)
cx
|- { }; ( ) =~
6.2.4.
text
pattern
We use the following additional notation:
ranges
over strings
The semantics of the
text
pattern are as follows:
(text 1)
cx
|- { }; ( ) =~
(text 2)
cx
|- { };
=~
cx
|- { };
=~
The effect of the above rule is that a
text
element matches zero or more strings.
6.2.5.
oneOrMore
pattern
We use the following additional notation:
disjoint(
asserts that there is no name that is
the name of both an attribute in
and of an attribute in
The semantics of the
oneOrMore
pattern are as follows:
(oneOrMore 1)
cx
|-
=~
cx
|-
=~
(oneOrMore 2)
cx
|-
=~
����
cx
|-
=~
����disjoint(
cx
|-
=~
6.2.6.
interleave
pattern
We use the following additional notation:
interleaves
asserts that
is an interleaving of
and
The semantics of interleaving are defined by the following rules.
(interleaves 1)
( ) interleaves ( ); ( )
(interleaves 2)
interleaves
interleaves
(interleaves 3)
interleaves
interleaves
For example, the interleavings of
and
are
, and
The semantics of the
interleave
pattern are
as follows:
(interleave)
cx
|-
=~
����
cx
|-
=~
����
interleaves
cx
|-
=~
Note
The restriction in
Section 7.3
ensures that the set of attributes constructed in the consequent will
not have multiple attributes with the same name.
6.2.7.
element
and
attribute
pattern
The value of an attribute is always a single string, which may
be empty. Thus, the empty sequence is not a possible attribute value.
On the hand, the children of an element can be an empty sequence and
cannot consist of an empty string. In order to ensure that validation
handles attributes and elements consistently, we introduce a variant
of matching called
weak matching
. Weak
matching is used when matching the pattern for the value of an
attribute or for the attributes and children of an element. We use
the following notation to define weak matching.
""
represents an empty
string
ws
ranges over the empty
sequence and strings that consist entirely of
whitespace
cx
|-
=~
weak
asserts that with respect to context
cx
, the
attributes
and the sequence of elements and
strings
weakly matches the pattern
The semantics of weak matching are as follows:
(weak match 1)
cx
|-
=~
cx
|-
=~
weak
(weak match 2)
cx
|-
; ( ) =~
cx
|-
ws
=~
weak
(weak match 3)
cx
|-
""
=~
cx
|-
; ( ) =~
weak
We use the following additional notation:
attribute(
constructs an attribute with name
and value
element(
cx
constructs an element with name
context
cx
attributes
and mixed sequence
as children
okAsChildren(
asserts that the mixed sequence
can occur as
the children of an element: it does not contain any member that is an
empty string, nor does it contain two consecutive members that are
both strings
deref(
ln
) =
nc
asserts that the grammar contains
ln
The semantics of the
attribute
pattern are as follows:
(attribute)
cx
|- { };
=~
weak
����
in
nc
cx
|- attribute(
); ( ) =~
The semantics of the
element
pattern are as follows:
(element)
cx
|-
=~
weak
����
in
nc
����okAsChildren(
����deref(
ln
) =
nc
cx
|- { };
ws
, element(
cx
),
ws
=~
name="
ln
/>
6.2.8.
data
and
value
pattern
RELAX NG relies on datatype libraries to perform datatyping.
A datatype library is identified by a URI. A datatype within a
datatype library is identified by an NCName. A datatype library
provides two services.
It can determine whether a string is a legal
representation of a datatype. This service accepts a list of zero or
more parameters. For example, a string datatype might have a parameter
specifying the length of a string. The datatype library determines
what parameters are applicable for each datatype.
It can determine whether two strings represent the
same value of a datatype. This service does not have any
parameters.
Both services may make use of the context of a string. For
example, a datatype representing a QName would use the namespace
map.
We use the following additional notation:
datatypeAllows(
ln
params
cx
asserts that in the datatype library identified by URI
, the string
interpreted with
context
cx
is a legal
value of datatype
ln
with parameters
params
datatypeEqual(
ln
cx
cx
asserts that in the datatype library identified by URI
, string
interpreted with
context
cx
represents the same value of
the datatype
ln
as the string
interpreted in the context of
cx
params
ranges over sequences of parameters
cx
within the start-tag of a pattern refers to the context
of the pattern element
context(
cx
constructs a context which is the same as
cx
except that the default namespace is
; if
is the empty string, then there is no default namespace
in the constructed context
The datatypeEqual function must be reflexive, transitive
and symmetric, that is, the following inference rules must hold:
(datatypeEqual reflexive)
datatypeAllows(
ln
params
cx
datatypeEqual(
ln
cx
cx
(datatypeEqual transitive)
datatypeEqual(
ln
cx
cx
����datatypeEqual(
ln
cx
cx
datatypeEqual(
ln
cx
cx
(datatypeEqual symmetric)
datatypeEqual(
ln
cx
cx
datatypeEqual(
ln
cx
cx
The semantics of the
data
and
value
patterns are as follows:
(value)
datatypeEqual(
ln
cx
, context(
cx
))
cx
|- { };
=~
type="
ln
ns="
cx
(data 1)
datatypeAllows(
ln
params
cx
cx
|- { };
=~
datatypeLibrary="
type="
ln
params
(data 2)
datatypeAllows(
ln
params
cx
����not(
cx
|-
=~
cx
|- { };
=~
datatypeLibrary="
type="
ln
params
6.2.9. Built-in datatype library
The empty URI identifies a special built-in datatype library.
This provides two datatypes,
string
and
token
. No parameters are allowed for either of
these datatypes.
asserts that
and
are identical
normalizeWhiteSpace(
returns the string
with leading and trailing whitespace characters removed,
and with each other maximal sequence of whitespace characters
replaced by a single space character
The semantics of the two built-in datatypes are as
follows:
(string allows)
datatypeAllows(
""
"string"
, ( ),
cx
(string equal)
datatypeEqual(
""
"string"
cx
cx
(token allows)
datatypeAllows(
""
"token"
, ( ),
cx
(token equal)
normalizeWhiteSpace(
) = normalizeWhiteSpace(
datatypeEqual(
""
"token"
cx
cx
6.2.10.
list
pattern
We use the following additional notation:
split(
returns a sequence of strings one for each whitespace delimited token
of
; each string in the returned sequence will
be non-empty and will not contain any
whitespace
The semantics of the
list
pattern are as follows:
(list)
cx
|- { }; split(
) =~
cx
|- { };
=~
Note
It is crucial in the above inference rule that the
sequence that is matched against a pattern can contain consecutive
strings.
6.3. Validity
Now we can define when an element is valid with respect to a
schema. We use the following additional notation:
ranges over elements
valid(
asserts that the element
is valid with
respect to the grammar
start() =
asserts that the grammar contains
An element is valid if together with an empty set of attributes
it matches the
start
pattern of the grammar.
(valid)
start() =
����
cx
|- { };
=~
valid(
6.4. Example
Let
be
element( name(
""
"foo"
),
cx
, { },
where
is
and
is
element( name(
"http://www.example.com/n1"
"bar1"
),
cx
, { }, ( ) )
and
is
element( name(
"http://www.example.com/n2"
"bar2"
),
cx
, { }, ( ) )
Assuming appropriate definitions of
cx
cx
and
cx
, this represents the document in
Section 2.1
We now show how
can be shown to
be valid with respect to the schema in
Section 5.1
. The schema is equivalent to the
following propositions:
start() =
name="
foo
/>
deref(
"foo.element"
) =
"foo"
bar1
/>
name="
bar2
/>
deref(
"bar1.element"
) =
"bar1"
deref(
"bar2.element"
) =
"bar2"
Let name class
nc
be
"bar1"
and let
nc
be
"bar2"
Then, by the inference rule (name) in
Section 6.1
, we have
name(
"http://www.example.com/n1"
"bar1"
) in
nc
and
name(
"http://www.example.com/n2"
"bar2"
) in
nc
By the inference rule (empty) in
Section 6.2.3
we have
cx
|- { }; ( ) =~
and
cx
|- { }; ( ) =~
Thus by the inference rule (element) in
Section 6.2.7
, we have
cx
|- { };
=~
name="
bar1
/>
Note that we have chosen
cx
, since any context is allowed.
Likewise, we have
cx
|- { };
=~
name="
bar2
/>
By the inference rule (group) in
Section 6.2.1
, we have
cx
|- { };
=~
bar1
/>
name="
bar2
/>
By the inference rule (element) in
Section 6.2.7
, we have
cx
|- { }; element( name(
""
"foo"
),
cx
, { },
) =~
name="
foo
/>
Here
cx
is an arbitrary
context.
Thus we can apply the inference rule (valid) in
Section 6.3
and obtain
valid(
7. Restrictions
The following constraints are all checked after the grammar has
been transformed to the simple form described in
Section 5
. The purpose of these restrictions is to
catch user errors and to facilitate implementation.
7.1. Contextual restrictions
In this section we describe restrictions on where elements are
allowed in the schema based on the names of the ancestor elements. We
use the concept of a
prohibited path
to
describe these restrictions. A path is a sequence of NCNames separated
by
or
//
An element matches a path
, where
is an
NCName, if and only if the local name of the element is
An element matches a path
where
is an NCName and
is a path, if and only if the local name
of the element is
and the element has a
child that matches
An element matches a path
//
where
is an NCName and
is a path, if and only if the local name
of the element is
and the element has a
descendant that matches
For example, the element
matches the paths
foo
foo/bar
foo//bar
foo//baz
foo/bar/baz
foo/bar//baz
and
foo//bar/baz
but not
foo/baz
or
foobar
A correct RELAX NG schema must be such that, after
transformation to the simple form, it does not contain any element
that matches a prohibited path.
7.1.1.
attribute
pattern
The following paths are prohibited:
attribute//ref
attribute//attribute
7.1.2.
oneOrMore
pattern
The following paths are prohibited:
oneOrMore//group//attribute
oneOrMore//interleave//attribute
7.1.3.
list
pattern
The following paths are prohibited:
list//list
list//ref
list//attribute
list//text
list//interleave
7.1.4.
except
in
data
pattern
The following paths are prohibited:
data/except//attribute
data/except//ref
data/except//text
data/except//list
data/except//group
data/except//interleave
data/except//oneOrMore
data/except//empty
Note
This implies that an
except
element
with a
data
parent can contain only
data
value
and
choice
elements.
7.1.5.
start
element
The following paths are prohibited:
start//attribute
start//data
start//value
start//text
start//list
start//group
start//interleave
start//oneOrMore
start//empty
7.2. String sequences
RELAX NG does not allow a pattern such as:
Nor does it allow a pattern such as:
More generally, if the pattern for the content of an element or
attribute contains
a pattern that can match a child
(that is, an
element
data
value
list
or
text
pattern), and
a pattern that matches a single string (that is, a
data
value
or
list
pattern),
then the two patterns must be alternatives to each other.
This rule does not apply to patterns occurring within a
list
pattern.
To formalize this, we use the concept of a content-type. A
pattern that is allowable as the content of an element has one of
three content-types: empty, complex and simple. We use the following
notation.
empty( )
returns the empty content-type
complex( )
returns the complex content-type
simple( )
returns the simple content-type
ct
ranges over content-types
groupable(
ct
ct
asserts that the content-types
ct
and
ct
are groupable
The empty content-type is groupable with anything. In addition,
the complex content-type is groupable with the complex content-type. The
following rules formalize this.
(group empty 1)
groupable(empty( ),
ct
(group empty 2)
groupable(
ct
, empty( ))
(group complex)
groupable(complex( ), complex( ))
Some patterns have a content-type. We use the following
additional notation.
ct
asserts that pattern
has
content-type
ct
max(
ct
ct
returns the maximum of
ct
and
ct
where the
content-types in increasing order are empty( ),
complex( ), simple( )
The following rules define when a pattern has a content-type and,
if so, what it is.
(value)
type="
ln
ns="
simple( )
(data 1)
datatypeLibrary="
type="
ln
params
simple( )
(data 2)
ct
datatypeLibrary="
type="
ln
params
simple( )
(list)
simple( )
(text)
complex( )
(ref)
name="
ln
/>
complex( )
(empty)
empty( )
(attribute)
ct
empty( )
(group)
ct
����
ct
����groupable(
ct
ct
max(
ct
ct
(interleave)
ct
����
ct
����groupable(
ct
ct
max(
ct
ct
(oneOrMore)
ct
����groupable(
ct
ct
ct
(choice)
ct
����
ct
max(
ct
ct
Note
The antecedent in the (data 2) rule above is in fact
redundant because of the prohibited paths in
Section 7.1.4
Now we can describe the restriction. We use the following
notation.
incorrectSchema()
asserts that the schema is incorrect
All patterns occurring as the content of an element pattern must
have a content-type.
(element)
deref(
ln
) =
nc
����not(
ct
incorrectSchema()
7.3. Restrictions on attributes
Duplicate attributes are not allowed. More precisely, for a
pattern
p1
p2
or
p1
p2
, there must
not be a name that belongs to both the name class of an
attribute
pattern occurring in
p1
and the name class of an
attribute
pattern occurring in
p2
. A pattern
p1
is defined to
occur in
a pattern
p2
if
p1
is
p2
, or
p2
is a
choice
interleave
group
or
oneOrMore
element and
p1
occurs in one or more children of
p2
Attributes using infinite name classes must be repeated. More
precisely, an
attribute
element that has an
anyName
or
nsName
descendant
element must have a
oneOrMore
ancestor
element.
Note
This restriction is necessary for closure under
negation.
7.4. Restrictions on
interleave
For a pattern
p1
p2
there must not be a name that belongs to both the name
class of an
element
pattern referenced by a
ref
pattern occurring in
p1
and the name class of an
element
pattern referenced by a
ref
pattern occurring in
p2
, and
text
pattern must not occur in
both
p1
and
p2
Section 7.3
defines when one
pattern is considered to occur in another pattern.
8. Conformance
A conforming RELAX NG validator must be able to determine for
any XML document whether it is a correct RELAX NG schema. A
conforming RELAX NG validator must be able to determine for any XML
document and for any correct RELAX NG schema whether the document is
valid with respect to the schema.
However, the requirements in the preceding paragraph do not
apply if the schema uses a datatype library that the validator does
not support. A conforming RELAX NG validator is only required to
support the built-in datatype library described in
Section 6.2.9
. A validator that claims conformance to
RELAX NG should document which datatype libraries it supports. The
requirements in the preceding paragraph also do not apply if the
schema includes
externalRef
or
include
elements and the validator is unable to
retrieve the resource identified by the URI or is unable to construct
an element from the retrieved resource. A validator that claims
conformance to RELAX NG should document its capabilities for handling
URI references.
A. RELAX NG schema for RELAX NG
xmlns="http://relaxng.org/ns/structure/1.0">
B. Changes since version 0.9
The changes in this version relative to version 0.9
are as follows:
in the namespace URI,
0.9
has been
changed to
1.0
data/except//empty
has been added
as a prohibited path (see
Section 7.1.4
start//empty
has been added
as a prohibited path (see
Section 7.1.5
Section 4.12
now specifies how a
list
element with more than one child element is
transformed
Section 4.20
now specifies how a
notAllowed
element occurring in an
except
element is transformed
although a relative URI is not allowed as the value of
the
ns
and
datatypeLibrary
attributes, an empty string is allowed (see
Section 3
the removal of unreachable definitions in
Section 4.19
is now correctly specified
Section 4.20
now specifies that
define
elements that are no longer reachable are
removed
Section 4.16
has been added; the
restrictions on the contents of
except
in name
classes that are now specified in the newly added section were
previously specified in a subsection of
Section 7.1
, which has been
removed
the treatment of element and attribute values that
consist only of whitespace has been refined (see
Section 6.2.7
and
Section 6.2.8
attributes with infinite name classes are now required
to be repeated (see
Section 7.3
restrictions have been imposed on
interleave
(see
Section 7.4
);
list//interleave
has been added as a prohibited path (see
Section 7.1.3
some of the prohibited paths in
Section 7.1
have been corrected to use
ref
rather than
element
an error in the inference rule (text 1) in
Section 6.2.4
has been corrected
the value of the
ns
attribute is
now unconstrained (see
Section 3
C. RELAX NG TC (Non-Normative)
This specification was prepared and approved for publication by
the RELAX NG TC. The current members of the TC are:
Fabio Arciniegas
James Clark
Mike Fitzgerald
KAWAGUCHI Kohsuke
Josh Lubell
MURATA Makoto
Norman Walsh
David Webber
References
Normative
RFC 2396
T. Berners-Lee, R. Fielding, L. Masinter.
RFC 2396:
Uniform Resource Identifiers (URI): Generic
Syntax
IETF (Internet Engineering Task Force). 1998.
RFC 2732
R. Hinden, B. Carpenter, L. Masinter.
RFC 2732: Format for Literal IPv6 Addresses in URL's
IETF (Internet Engineering Task Force), 1999.
RFC 3023
M. Murata,
S. St.Laurent, D. Kohn.
RFC 3023: XML Media
Types
. IETF (Internet Engineering Task Force),
2001.
XLink
Steve DeRose, Eve Maler
and David Orchard, editors.
XML Linking
Language (XLink) Version 1.0
W3C (World Wide Web Consortium), 2001.
XML 1.0
Tim Bray,
Jean Paoli, and
C. M. Sperberg-McQueen, Eve Maler, editors.
Extensible Markup
Language (XML) 1.0 Second Edition
W3C (World Wide Web Consortium), 2000.
XML Infoset
John Cowan, Richard Tobin,
editors.
XML
Information Set
W3C (World Wide Web Consortium), 2001.
XML Namespaces
Tim Bray,
Dave Hollander,
and Andrew Layman, editors.
Namespaces in
XML
W3C (World Wide Web Consortium), 1999.
Non-Normative
RELAX
MURATA Makoto.
RELAX (Regular
Language description for XML)
. INSTAC
(Information Technology Research and Standardization Center), 2001.
TREX
James Clark.
TREX - Tree Regular Expressions for XML
Thai Open Source Software Center, 2001.
Tutorial
James Clark,
Makoto MURATA, editors.
RELAX
NG Tutorial
. OASIS, 2001.
W3C XML Schema Datatypes
Paul V. Biron, Ashok Malhotra, editors.
XML Schema Part 2: Datatypes
W3C (World Wide Web Consortium), 2001.
XML Schema Formal
Allen Brown,
Matthew Fuchs, Jonathan Robie, Philip Wadler, editors.
XML Schema: Formal Description
W3C (World Wide Web Consortium), 2001.