SPARQL 1.1 Query Language

This section defines the correct behavior for evaluation of graph patterns and solution modifiers, given a query string and an RDF dataset. It does not imply a SPARQL implementation must use the process defined here.

The outcome of executing a SPARQL query is defined by a series of steps, starting from the SPARQL query as a string, turning that string into an abstract syntax form, then turning the abstract syntax into a SPARQL abstract query comprising operators from the SPARQL algebra. This abstract query is then evaluated on an RDF dataset.

18.2 Translation to the SPARQL Algebra

This section defines the process of converting graph patterns and solution modifiers in a SPARQL query string into a SPARQL algebra expression. The process described converts one level of query nesting, as formed by subqueries using the nested SELECT syntax and is applied recursively on subqueries. Each level consists of graph pattern matching and filtering, followed by the application of solution modifiers.

The SPARQL query string is parsed and the abbreviations for IRIs and triple patterns given in section 4 are applied. At this point the abstract syntax tree is composed of:

Patterns	Modifiers	Query Forms	Other
RDF terms	DISTINCT	SELECT	VALUES
Property path expression	REDUCED	CONSTRUCT	SERVICE
Property path patterns	Projection	DESCRIBE
Groups	ORDER BY	ASK
OPTIONAL	LIMIT
UNION	OFFSET
GRAPH	Select expressions
BIND
GROUP BY
HAVING
MINUS
FILTER

The result of converting such an abstract syntax tree is a SPARQL query that uses the following symbols in the SPARQL algebra:

Graph Pattern	Solution Modifiers	Property Path
BGP	ToList	PredicatePath
Join	OrderBy	InversePath
LeftJoin	Project	SequencePath
Filter	Distinct	AlernativePath
Union	Reduced	ZeroOrMorePath
Graph	Slice	OneOrMorePath
Extend	ToMultiSet	ZeroOrOnePath
Minus		NegatedPropertySet
Group
Aggregation
AggregateJoin

Slice is the combination of OFFSET and LIMIT.

ToList is used where conversion from the results of graph pattern matching to sequences occurs.

ToMultiSet is used where conversion from a solution sequence to a multiset occurs.

18.2.1 Variable Scope

We define a variable to be in-scope if there is a way for a variable to be in the domain of a solution mapping at that point in the execution of the SPARQL algebra for the query. The definition below provides a way of determing this from the abstract syntax of a query.

Note that a subquery with a projection can hide variables; use of a variable in FILTER, or in MINUS does not cause a variable to be in-scope outside of those forms.

Let P, P1, P2 be graph patterns and E, E1,...En be expressions. A variable v is in-scope if:

Syntax Form	In-scope variables
Basic Graph Pattern (BGP)	`v` occurs in the BGP
Path	`v` occurs in the path
Group `{ P1 P2 ... }`	`v` is in-scope if it is in-scope in one or more of P1, P2, ...
`GRAPH term { P }`	`v` is `term` or `v` is in-scope in P
`{ P1 } UNION { P2 }`	`v` is in-scope in P1 or in-scope in P2
`OPTIONAL {P}`	`v` is in-scope in P
`SERVICE term {P}`	`v` is `term` or `v` is in-scope in P
`BIND (expr AS v)`	`v` is in-scope
`SELECT .. v .. { P }`	`v` is in-scope
`SELECT ... (expr AS v)`	`v` is in-scope
`GROUP BY (expr AS v)`	`v` is in-scope
`SELECT * { P }`	`v` is in-scope in `P`
`VALUES v { values }`	`v` is in-scope
`VALUES varlist { values }`	`v` is in-scope if `v` is in `varlist`

The variable v must not be in-scope at the point of the (expr AS v) form. The scoping for (expr AS v) applies immediately in SELECT expressions.

In BIND (expr AS v) requires that the variable v is not in-scope from the preceeding elements in the group graph pattern in which it is used.

In SELECT, the variable v must not be in-scope in the graph pattern of the SELECT clause, nor used in another select expression earlier in the clause.

18.2.2 Converting Graph Patterns

This section describes the process for translating a SPARQL graph pattern into a SPARQL algebra expression. This process is applied to the group graph pattern (the unit between {...} delimiters) forming the WHERE clause of a query, and recursively to each syntactic element within the group graph pattern. The result of the translation is a SPARQL algebra expression.

In summary, the steps are applied as follows:

We write

translate(graph pattern)

for the algorthm described here to translate graph patterns.

The working group notes that in SPARQL 1.0, the point at which the simplification step is applied leads to ambiguous transformation of queries involving a doubly nested filter and pattern in an optional:

OPTIONAL { { ... FILTER ( ... ?x ... ) } }..

This is illustrated by two non-normative test cases:

Applying the simpification step after all the translation of graph patterns is the preferred reading.

18.2.2.1 Expand Syntax Forms

Expand abbreviations for IRIs and triple patterns given in section 4.

18.2.2.2 Collect `FILTER` Elements

FILTER expressions apply to the whole group graph pattern in which they appear. The algebra operators to perform filtering are added to the group after translation of each group element. We collect the filters together here and remove them from group, then apply them to the whole translated group graph pattern.

In this step, we also translate graph patterns within FILTER expressions EXISTS and NOT EXISTS.

Let FS := empty set

For each form FILTER(expr) in the group graph pattern:
    In expr, replace NOT EXISTS{P} with fn:not(exists(translate(P))) 
    In expr, replace EXISTS{P} with exists(translate(P))
    FS := FS ∪ {expr}
    End

The set of filter expressions FS is used later.

18.2.2.3 Translate Property Path Expressions

The following table gives the translation of property paths expressions from SPARQL syntax to terms in the SPARQL algebra. This applies to all elements of a property path expression recursively.

The next step after this one translates certain forms to triple patterns, and these are converted later to basic graph patterns by adjacency (without intervening group pattern delimiters { and }) or other syntax forms. Overall, SPARQL syntax property paths of just an IRI become triple patterns and these are aggregated into basic graph patterns.

Notes:

The order of forms IRI and ^IRI in negated property sets is not relevant.

We introduce the following symbols:

link
inv
alt
seq
ZeroOrMorePath
OneOrMorePath
ZeroOrOnePath
NPS (for NegatedPropertySet)

Syntax Form (path)	Algebra (path)
`iri`	`link(iri)`
`^path`	`inv(path)`
`!(:iri₁\|...\|:iri_n)`	`NPS({:iri₁ ... :iri_n})`
`!(^:iri₁\|...\|^:iri_n)`	`inv(NPS({:iri₁ ... :iri_n}))`
`!(:iri₁\|...\|:iri_i\|^:iri_i+1\|...\|^:iri_m)`	`alt(NPS({:iri₁ ...:iri_i}), inv(NPS({:iri_i+1, ..., :iri_m})) )`
`path1 / path2`	`seq(path1, path2)`
`path1 \| path2`	`alt(path1, path2)`
`path*`	`ZeroOrMorePath(path)`
`path+`	`OneOrMorePath(path)`
`path?`	`ZeroOrOnePath(path)`

18.2.2.4 Translate Property Path Patterns

The previous step translated property path expressions. This step translates property path patterns, which are a subject end point, property path expression and object end point, into triple patterns or wraps in a general algebra operation for path evaluation.

Notes:

X and Y are RDF terms or variables.
?V is a fresh variable.
P and Q are path expressions.
These are only applied to property path patterns, not within property path expressions.
Translations earlier in the table are applied in preference to the last translation.
The final translation simply wraps any remaining property path expression to use a common form Path(...).

Algebra (path)	Translation
`X link(iri) Y`	`X iri Y`
`X inv(iri) Y`	`Y iri X`
`X seq(P, Q) Y`	`X P ?V . ?V Q P`
`X P Y`	`Path(X, P, Y)`

Examples of the whole path translation process (?_V is a fresh variable):

?s :p/:q ?o

?s :p ?_V .
?_V :q ?o

?s :p* ?o

Path(?s, ZeroOrMorePath(link(:p)), ?o)

:list rdf:rest*/rdf:first ?member

Path(:list, ZeroOrMorePath(link(rdf:rest)), ?_V) .
?_V rdf:first ?member

18.2.2.5 Translate Basic Graph Patterns

After translating property paths, any adjacent triple patterns are collected together to form a basic graph pattern BGP(triples).

18.2.2.6 Translate Graph Patterns

Next, we translate each remaining graph pattern form, recursively applying the translation process.

If the form is GroupOrUnionGraphPattern

Let A := undefined
          
For each element G in the GroupOrUnionGraphPattern
    If A is undefined
        A := Translate(G)
    Else
        A := Union(A, Translate(G))
    End

The result is A

If the form is GraphGraphPattern

If the form is GRAPH IRI GroupGraphPattern
    The result is Graph(IRI, Translate(GroupGraphPattern))

If the form is GRAPH Var GroupGraphPattern
    The result is Graph(Var, Translate(GroupGraphPattern))

If the form is GroupGraphPattern:

Let FS := the empty set
Let G := the empty pattern, a basic graph pattern which is the empty set.

For each element E in the GroupGraphPattern

    If E is of the form OPTIONAL{P} 
        Let A := Translate(P)
        If A is of the form Filter(F, A2)
            G := LeftJoin(G, A2, F)
        Else 
            G := LeftJoin(G, A, true)
            End
        End

    If E is of the form MINUS{P}
        G := Minus(G, Translate(P))
        End

    If E is of the form BIND(expr AS var)
        G := Extend(G, var, expr)
        End

    If E is any other form 
        Let A := Translate(E)
        G := Join(G, A)
        End

   End
   
The result is G.

If the form is InlineData

The result is a multiset of solution mappings 'data'.

data is formed by forming a solution mapping from the variable in the corresponding position in list of variables (or single variable), omitting a binding if the BindingValue is the word UNDEF.

If the form is SubSelect

The result is ToMultiset(Translate(SubSelect))

18.2.2.7 Filters of Group

After the group has been translated, the filter expressions are added so they wil apply to the whole of the rest of the group:

If FS is not empty
    Let G := output of preceding step
    Let X := Conjunction of expressions in FS
    G := Filter(X, G)
    End

18.2.2.8 Simplification step

Some groups of one graph pattern become join(Z, A), where Z is the empty basic graph pattern (which is the empty set). These can be replaced by A. The empty graph pattern Z is the identity for join:

Replace join(Z, A) by A
Replace join(A, Z) by A

18.2.3 Examples of Mapped Graph Patterns

The second form of a rewrite example is the first with empty group joins removed by the simplification step.

Example: group with a basic graph pattern consisting of a single triple pattern:

{ ?s ?p ?o }

Join(Z, BGP(?s ?p ?o) )

BGP(?s ?p ?o)

Example: group with a basic graph pattern consisting of two triple patterns:

{ ?s :p1 ?v1 ; :p2 ?v2 }

BGP( ?s :p1 ?v1 . ?s :p2 ?v2 )

Example: group consisting of a union of two basic graph patterns:

{ { ?s :p1 ?v1 } UNION {?s :p2 ?v2 } }

Union(Join(Z, BGP(?s :p1 ?v1)),
Join(Z, BGP(?s :p2 ?v2)) )

Union( BGP(?s :p1 ?v1) , BGP(?s :p2 ?v2) )

Example: group consisting of a union of a union and a basic graph pattern:

{ { ?s :p1 ?v1 } UNION {?s :p2 ?v2 } UNION {?s :p3 ?v3 } }

Union(
    Union( Join(Z, BGP(?s :p1 ?v1)),
           Join(Z, BGP(?s :p2 ?v2))) ,
    Join(Z, BGP(?s :p3 ?v3)) )

Union(
    Union( BGP(?s :p1 ?v1) ,
           BGP(?s :p2 ?v2),
    BGP(?s :p3 ?v3))

Example: group consisting of a basic graph pattern and an optional graph pattern:

{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 } }

LeftJoin(
    Join(Z, BGP(?s :p1 ?v1)),
    Join(Z, BGP(?s :p2 ?v2)),
    true)

LeftJoin(BGP(?s :p1 ?v1), BGP(?s :p2 ?v2), true)

Example: group consisting of a basic graph pattern and two optional graph patterns:

{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 } OPTIONAL { ?s :p3 ?v3 } }

LeftJoin(
    LeftJoin(
        BGP(?s :p1 ?v1),
        BGP(?s :p2 ?v2),
        true) ,
    BGP(?s :p3 ?v3),
    true)

Example: group consisting of a basic graph pattern and an optional graph pattern with a filter:

{ ?s :p1 ?v1 OPTIONAL {?s :p2 ?v2 FILTER(?v1<3) } }

LeftJoin(
     Join(Z, BGP(?s :p1 ?v1)),
     Join(Z, BGP(?s :p2 ?v2)),
     (?v1<3) )

LeftJoin(
    BGP(?s :p1 ?v1) ,
    BGP(?s :p2 ?v2) ,
   (?v1<3) )

Example: group consisting of a union graph pattern and an optional graph pattern:

{ {?s :p1 ?v1} UNION {?s :p2 ?v2} OPTIONAL {?s :p3 ?v3} }

LeftJoin(
Union(BGP(?s :p1 ?v1),
BGP(?s :p2 ?v2)) ,
BGP(?s :p3 ?v3) ,
true )

Example: group consisting of a basic graph pattern, a filter and an optional graph pattern:

{ ?s :p1 ?v1 FILTER (?v1 < 3 ) OPTIONAL {?s :p2 ?v2} }

Filter( ?v1 < 3 ,
LeftJoin( BGP(?s :p1 ?v1), BGP(?s :p2 ?v2), true) ,
)

Example: Pattern involving BIND:

{ ?s :p ?v . BIND (2*?v AS ?v2) ?s :p1 ?v2 }

Join(
Extend( BGP(?s :p ?v), ?v2, 2*?v) ,
BGP(?s :p1 ?v2) )

Example: Pattern involving BIND:

{ ?s :p ?v . {} BIND (2*?v AS ?v2) }

Join(
BGP(?s :p ?v), ?v2, 2*?v) ,
Extend({}, ?v2, 2*?v)
)

Example: Pattern involving MINUS:

{ ?s :p ?v . MINUS {?s :p1 ?v2 } }

Minus(
BGP(?s :p ?v)
BGP(?s :p1 ?v2))

Example: Pattern involving a subquery:

{ ?s :p ?o . {SELECT DISTINCT ?o {?o ?p ?z} } }

Join(
   BGP(?s :p ?o) ,
   ToMultiSet(
     Distinct(Project(BGP(?o ?p ?z), {?o})) )
   )

18.2.4 Converting Groups, Aggregates, HAVING, final VALUES clause and SELECT Expressions

In this step, we process clauses on the query level in the following order:

Grouping
Aggregates
HAVING
VALUES
Select expressions

18.2.4.1 Grouping and Aggregation

Step: GROUP BY

If the GROUP BY keyword is used, or there is implicit grouping due to the use of aggregates in the projection, then grouping is performed by the Group function. It divides the solution set into groups of one or more solutions, with the same overall cardinality. In case of implicit grouping, a fixed constant (1) is used to group all solutions into a single group.

Step: Aggregates

The aggregation step is applied as a transformation on the query level, replacing aggregate expressions in the query level with Aggregation() algebraic expressions.

The transformation for query levels that use any aggregates is given below:

Let A := the empty sequence
Let Q := the query level being evaluated
Let P := the algebra translation of the GroupGraphPattern of the query level
Let E := [], a list of pairs of the form (variable, expression)

If Q contains GROUP BY exprlist
   Let G := Group(exprlist, P)
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
   Let G := Group((1), P)
Else
   skip the rest of the aggregate step
   End

Global i := 1   # Initially 1 for each query processed

For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q
  For each unaggregated variable V in X
      Replace V with Sample(V)
      End
  For each aggregate R(args ; scalarvals) now in X
      # note scalarvals may be omitted, then it's equivalent to the empty set
      A_i := Aggregation(args, R, scalarvals, G)
      Replace R(...) with agg_i in Q
      i := i + 1
      End
  End

For each variable V appearing outside of an aggregate
   A_i := Aggregation(V, Sample, {}, G)
   E := E append (V, agg_i)
   i := i + 1
   End

A := A_i, ..., A_i-1
P := AggregateJoin(A)

Note: agg_i is a temporary variable. E is then used in 18.2.4.4 for the processing of select expressions.

Example:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (SUM(?val) AS ?sum) (COUNT(?a) AS ?count)
WHERE {
  ?a rdf:value ?val .
} GROUP BY ?a

The SUM expression becomes agg₁, and the COUNT expression becomes agg₂.

Let G := Group((?a), BGP(?a rdf:value ?val))
A₁ = Aggregation((?val), Sum, {}, G)
A₂ = Aggregation((?a), Count, {}, G)
A := (A₁, A₂)
Let P := AggregateJoin(A)

18.2.4.2 HAVING

The HAVING expression is evaluated using the same rules as FILTER(). Note that, due to the logic position in which the HAVING clause is evaluated, expressions projected by the SELECT clause are not visible to the HAVING clause.

Let Q := the query level being evaluated
Let P := the algebra translation of the query level so far

For each HAVING(E) in Q
    P := Filter(E, P)
    End

18.2.4.3 VALUES

If the query has a trailing VALUES clause:

Let P := the algebra translation of the query level so far
P := Join(P, ToMultiSet(data))
  where data is a solution sequence formed from the VALUES clause

The translatation of the data is the same as for inline data.

18.2.4.4 SELECT Expressions

Step: Select expressions

We have two forms of the abstract syntax to consider:

SELECT selItem ... { pattern }
SELECT * { pattern }

Let X := algebra from earlier steps
Let VS := list of all variables visible in the pattern,
           so restricted by sub-SELECT projected variables and GROUP BY variables.
           Not visible: only in filter, exists/not exists, masked by a subselect, 
                        non-projected GROUP variables, only in the right hand side of MINUS

Let PV := {}, a set of variable names
Note, E is a list of pairs of the form (variable, expression), defined in section 18.2.4
  
If "SELECT *"
    PV := VS

If  "SELECT selItem ...:"  
    For each selItem:
        If selItem is a variable
            PV := PV ∪ { variable }
        End
        If selItem is (expr AS variable)
            variable must not appear in VS nor in PV; if it does then generate a syntax error and stop
            PV := PV ∪ { variable }
            E := E append (variable, expr) 
        End
    End

For each pair (var, expr) in E
    X := Extend(X, var, expr)
    End
  
Result is X  
The set PV is used later for projection.

The syntax error arises for use of a variable as the named target of AS (e.g. ... AS ?x) when the variable is used inside the WHERE clause of the SELECT or if already used as the traget of AS in this SELECT expression.

18.2.5 Converting Solution Modifiers

Solutions modifiers apply to the processing of a SPARQL query after pattern matching. The solution modifiers are applied to a query in the following order:

Order by
Projection
Distinct
Reduced
Offset
Limit

Step: ToList

ToList turns a multiset into a sequence with the same elements and cardinality. There is no implied ordering to the sequence; duplicates need not be adjacent.

Let M := ToList(Pattern)

18.2.5.1 ORDER BY

If the query string has an ORDER BY clause

M := OrderBy(M, list of order comparators)

18.2.5.2 Projection

The set of projection variables, PV, was calculated in the processing of SELECT expressions.

M := Project(M, PV)

where vars is the set of variables mentioned in the SELECT clause or all named variables that are in-scope in the query if SELECT * used.

18.2.5.3 DISTINCT

If the query contains DISTINCT,

M := Distinct(M)

18.2.5.4 REDUCED

If the query contains REDUCED,

M := Reduced(M)

18.2.5.5 OFFSET and LIMIT

If the query contains "OFFSET start" or "LIMIT length"

M := Slice(M, start, length)

start defaults to 0

length defaults to (size(M)-start).

18.2.5.6 Final Algebra Expression

The overall abstract query is M.

18.3 Basic Graph Patterns

When matching graph patterns, the possible solutions form a multiset [multiset], also known as a bag. A multiset is an unordered collection of elements in which each element may appear more than once. It is described by a set of elements and a cardinality function giving the number of occurrences of each element from the set in the multiset.

Write μ for solution mappings.

Write μ₀ for the mapping such that dom(μ₀) is the empty set.

Write Ω₀ for the multiset consisting of exactly the empty mapping μ_0, with cardinality 1. This is the join identity.

Write μ(x) for the solution mapping variable x to RDF term t : { (x, t) }

Write Ω(x) for the multiset consisting of exactly μ(?x->t), that is, { { (x, t) } } with cardinality 1.

Definition: Compatible Mappings

Two solution mappings μ₁ and μ₂ are compatible if, for every variable v in dom(μ₁) and in dom(μ₂), μ₁(v) = μ₂(v).

Here, μ₁(v) = μ₂(v) means that μ₁(v) and μ₂(v) are the same RDF term.

If μ₁ and μ₂ are compatible then μ₁ ∪ μ₂ is also a mapping. Write merge(μ₁, μ₂) for μ₁ ∪ μ₂

Write card[Ω](μ) for the cardinality of solution mapping μ in a multiset of mappings Ω.

18.3.1 SPARQL Basic Graph Pattern Matching

A basic graph pattern is matched against the active graph for that part of the query. Basic graph patterns can be instantiated by replacing both variables and blank nodes by terms, giving two notions of instance. Blank nodes are replaced using an RDF instance mapping, σ, from blank nodes to RDF terms; variables are replaced by a solution mapping from query variables to RDF terms.

Definition: Pattern Instance Mapping

A Pattern Instance Mapping, P, is the combination of an RDF instance mapping, σ, and solution mapping, μ. P(x) = μ(σ(x))

For a BGP 'x', P(x) denotes the result of replacing blank nodes b in x for which σ is defined with σ(b) and all variables v in x for which μ is defined with μ(v).

Any pattern instance mapping defines a unique solution mapping and a unique RDF instance mapping obtained by restricting it to query variables and blank nodes respectively.

Definition: Basic Graph Pattern Matching

Let BGP be a basic graph pattern and let G be an RDF graph.

μ is a solution for BGP from G when there is a pattern instance mapping P such that P(BGP) is a subgraph of G and μ is the restriction of P to the query variables in BGP.

card[Ω](μ) = card[Ω](number of distinct RDF instance mappings, σ, such that P = μ(σ) is a pattern instance mapping and P(BGP) is a subgraph of G).

If a basic graph pattern is the empty set, then the solution is Ω₀.

18.3.2 Treatment of Blank Nodes

This definition allows the solution mapping to bind a variable in a basic graph pattern, BGP, to a blank node in G. Since SPARQL treats blank node identifiers in a results format document (SPARQL Query Results XML Format, SPARQL 1.1 Query Results JSON Format and SPARQL 1.1 Query Results CSV and TSV Formats) as scoped to the document, they cannot be understood as identifying nodes in the active graph of the dataset. If DS is the dataset of a query, pattern solutions are therefore understood to be not from the active graph of DS itself, but from an RDF graph, called the scoping graph, which is graph-equivalent to the active graph of DS but shares no blank nodes with DS or with BGP. The same scoping graph is used for all solutions to a single query. The scoping graph is purely a theoretical construct; in practice, the effect is obtained simply by the document scope conventions for blank node identifiers.

Since RDF blank nodes allow infinitely many redundant solutions for many patterns, there can be infinitely many pattern solutions (obtained by replacing blank nodes by different blank nodes). It is necessary, therefore, to somehow delimit the solutions for a basic graph pattern. SPARQL uses the subgraph match criterion to determine the solutions of a basic graph pattern. There is one solution for each distinct pattern instance mapping from the basic graph pattern to a subset of the active graph.

This is optimized for ease of computation rather than redundancy elimination. It allows query results to contain redundancies even when the active graph of the dataset is lean, and it allows logically equivalent datasets to yield different query results.

18.4 Property Path Patterns

This section defines the evaluation of property path patterns. A property path pattern is a subject endpoint (an RDF term or a variable), a property path express and an object endpoint. The translation of property path expressions converts some forms to other SPARQL expressions, such as converting property paths of length one to triple patterns, which in turn are combined into basic graph patterns. This leaves property path operators ZeroOrOnePath, ZeroOrMorePath, OneOrMorePath and NegatedPropertySets and also path expressions contained within these operators.

All remaining property path expressions are present in the algebra in the form Path(X, path, Y) for endpoints X and Y. For example: syntax(:p/:q)* is a ZeroOrMorePath expression involving a sequence property path becoming the algebra expession ZeroOrMorePath(seq(link(:p), link(:q))).

Notation

Write

eval(Path(X, PP, Y))

for the evaluation of the property path patterns. This produces a multiset of solution mappings μ, each solution mapping having a binding for variables used (each of X and Y can be a variable). Some operators only produce a set of solution mappings.

Write

Var(x₁, x₂, ..., x_n) = { x_i | i in 1...n and x_i is a variable }

for the variables in x₁, x₂, ..., x_n.

Write

`x:term`	when `x` is an RDF term
`x:var`	when `x` is a variable
`x:path`	when `x` is a path expression

All evaluation is carried out by matching the active graph at that point in the overall query evaluation. We omit explicitly including the active graph in each definition for clarity.

If both X and Y are variables, this is the same as:

eval(Path(X:var, link(iri), Y:var)) = { (X, xn) (Y, yn) | xn and yn are RDF terms and triple (xn iri yn) is in the active graph }

If X is a variable and Y an RDF term:

eval(Path(X:var, link(iri), Y:term)) = { (X, xn) | xn is an RDF term and triple (xn iri Y) is in the active graph }

If X is an RDF term and Y is a variable:

eval(Path(X:term, link(iri), Y:var)) = { (Y, yn) | yn is an RDF term and triple (X iri yn) is in the active graph }

If both X and Y are RDF terms:

eval(Path(X:term, link(iri), Y:term)) = { μ₀ } if triple (X iri Y) is in the active graph = { { } } = Ω₀ eval(Path(X:term, link(iri), Y:term)) = { } if triple (X iri Y) is not in the active graph

Informally, evaluating a Predicate Property Path is the same as executing a subquery SELECT * { X P Y } at that point in the query evaluation.

Definition: Evaluation of Sequence Property Path

Let P and Q be property path expressions. Let V be a fresh variable.

A = Join( eval(Path(X, P, V)), eval(Path(V, Q, Y)) )

eval(Path(X, seq(P,Q), Y)) = Project(A, Var(X,Y))

Informally, this is the same as:

SELECT * { X P _:a . _:a Q Y }

using the fact that a blank node _:a acts like a variable (under simple entailment) except it does not appear in the results from SELECT *.

Informally, this is the same as:

SELECT * { { X P Y } UNION { X Q Y } }

Definition: Node set of a graph

The node set of a graph G, nodes(G), is:

nodes(G) = { n | n is an RDF term that is used as a subject or object of a triple of G}

Definition: Evaluation of ZeroOrOnePath

eval(Path(X:term, ZeroOrOnePath(P), Y:var)) = { (Y, yn) | yn = X or {(Y, yn)} in eval(Path(X,P,Y)) }

eval(Path(X:var, ZeroOrOnePath(P), Y:term)) = { (X, xn) | xn = Y or {(X, xn)} in eval(Path(X,P,Y)) }

eval(Path(X:term, ZeroOrOnePath(P), Y:term)) = 
    { {} } if X = Y or eval(Path(X,P,Y)) is not empty
    { } othewise

eval(Path(X:var, ZeroOrOnePath(P), Y:var)) = 
    { (X, xn) (Y, yn) | either (yn in nodes(G) and xn = yn) or {(X,xn), (Y,yn)} in eval(Path(X,P,Y)) }

We define an auxillary function, ALP, used in the definitions of ZeroOrMorePath and OneOrMorePath. Note that the algorithm given here serves to specify the feature. An implementation is free to implement evaluation by any method that produces the same results for the query overall. The ZeroOrMorePath and OneOrMorePath forms return matches based on distinct nodes connected by the path.

The matching algorithm is based on following all paths, and detecting when a graph node (subject or object), has been already visited on the path.

Informally, this algorithm attempts to extend the multiset of results by one application of path at each step, noting which nodes it has visited for this particular path. If a node has been visited for the path under consideration, it is not a candidate for another step.

Definition: Function ALP

Let eval(x:term, path) be the evaluation of 'path', starting at RDF term x, 
                       and returning a multiset of RDF terms reached 
                       by repeated matches of path.

ALP(x:term, path) = 
    Let V = empty multiset
    ALP(x:term, path, V)
    return is V

# V is the set of nodes visited

ALP(x:term, path, V:set of RDF terms) =
    if ( x in V ) return 
    add x to V
    X = eval(x,path) 
    For n:term in X
        ALP(n, path, V)
        End

Definition: Evaluation of ZeroOrMorePath

eval(Path(X:term, ZeroOrMorePath(path), vy:var)) =
    { { (vy, n) } | n in ALP(X, path) }

eval(Path(vx:var, ZeroOrMorePath(path), vy:var)) =
    { { (vx, t), (vy, n) } |  t in nodes(G), (vy, n) in eval(Path(t, ZeroOrMorePath(path), vy)) }

eval(Path(vx:var, ZeroOrMorePath(path), y:term)) = 
    eval(Path(y:term, ZeroOrMorePath(inv(path)), vx:var))

eval(Path(x:term, ZeroOrMorePath(path), y:term)) = 
    { { } } if { (vy:var,y) } in eval(Path(x, ZeroOrMorePath(path) vy)
    { } otherwise

Definition: Evaluation of OneOrMorePath

eval(Path(X, OneOrMorePath(path), Y))

# For OneOrMorePath, we take one step of the path then start
# recording nodes for results.

eval(Path(x:term, OneOrMorePath(path), vy:var)) =
    Let X = eval(x, path)
    Let V = the empty multiset
    For n in X
        ALP(n, path, V)
        End
    result is V

eval(Path(vx:var, OneOrMorePath(path), vy:var)) =
   { { (vx, t), (vy, n) } |  t in nodes(G), (vy, n) in eval(Path(t, OneOrMorePath(path), vy)) }

eval(Path(vx:var, OneOrMorePath(path), y:term)) =
   eval(Path(y:term, OneOrMorePath(inv(path)), vx))

eval(Path(x:term, OneOrMorePath(path), y:term)) =
    { { } } if { (vy:var, y) } in eval(Path(x, OneOrMorePath(path), vy))
    { } otherwise

Definition: Evaluation of NegatedPropertySet

Write μ' as the extension of a solution mapping:
μ'(μ,x) = μ(x)   if x is a variable
μ'(μ,t) = t   if t is a RDF term

Let x and y be variables or RDF terms, and S a set of IRIs:

eval(Path(x, NPS(S), y)) = { μ | ∃ triple(μ'(μ,x), p, μ'(μ,y)) in G, such that the IRI of p ∉ S }

18.5 SPARQL Algebra

For each remaining symbol in a SPARQL abstract query, we define an operator for evaluation. The SPARQL algebra operators of the same name are used to evaluate SPARQL abstract query nodes as described in the section "Evaluation Semantics". Evaluation of basic graph patterns and property path patterns has been described above.

Definition: Filter

Let Ω be a multiset of solution mappings and expr be an expression. We define:

Filter(expr, Ω, D(G)) = { μ | μ in Ω and expr(μ) is an expression that has an effective boolean value of true }

card[Filter(expr, Ω, D(G))](μ) = card[Ω](μ)

Note that evaluating an exists(pattern) expression uses the dataset and active graph, D(G). See the evaluation of filter.

Definition: Join

Let Ω₁ and Ω₂ be multisets of solution mappings. We define:

Join(Ω₁, Ω₂) = { merge(μ₁, μ₂) | μ₁ in Ω₁and μ₂ in Ω₂, and μ₁ and μ₂ are compatible }

card[Join(Ω₁, Ω₂)](μ) =
for each merge(μ₁, μ₂), μ₁ in Ω₁and μ₂ in Ω₂ such that μ = merge(μ₁, μ₂),
sum over (μ₁, μ₂), card[Ω₁](μ₁)*card[Ω₂](μ₂)

It is possible that a solution mapping μ in a Join can arise in different solution mappings, μ₁and μ₂ in the multisets being joined. The cardinality of μ is the sum of the cardinalities from all possibilities.

Definition: Diff

Let Ω₁ and Ω₂ be multisets of solution mappings and expr be an expression. We define:

Diff(Ω₁, Ω₂, expr) = { μ | μ in Ω₁ such that ∀ μ′ in Ω₂, either μ and μ′ are not compatible or μ and μ' are compatible and expr(merge(μ, μ')) has an effective boolean value of false }

card[Diff(Ω₁, Ω₂, expr)](μ) = card[Ω₁](μ)

Diff is used internally for the definition of LeftJoin.

Definition: LeftJoin

Let Ω₁ and Ω₂ be multisets of solution mappings and expr be an expression. We define:

LeftJoin(Ω₁, Ω₂, expr) = Filter(expr, Join(Ω₁, Ω₂)) ∪ Diff(Ω₁, Ω₂, expr)

card[LeftJoin(Ω₁, Ω₂, expr)](μ) = card[Filter(expr, Join(Ω₁, Ω₂))](μ) + card[Diff(Ω₁, Ω₂, expr)](μ)

Written in full that is:

LeftJoin(Ω₁, Ω₂, expr) =
    { merge(μ_1, μ₂) | μ₁ in Ω₁ and μ₂ in Ω₂, μ₁ and μ₂ are compatible and expr(merge(μ₁, μ₂)) is true }
∪
    { μ₁ | μ₁ in Ω₁, ∀ μ₂ in Ω₂, μ₁ and μ₂ are not compatible, or Ω₂ is empty }
∪
    { μ₁ | μ₁ in Ω₁, ∃ μ₂ in Ω₂, μ₁ and μ₂ are compatible and expr(merge(μ₁, μ₂)) is false. }

As these are distinct, the cardinality of LeftJoin is cardinality of these individual components of the definition.

Definition: Union

Let Ω₁ and Ω₂ be multisets of solution mappings. We define:

Union(Ω₁, Ω₂) = { μ | μ in Ω₁ or μ in Ω₂ }

card[Union(Ω₁, Ω₂)](μ) = card[Ω₁](μ) + card[Ω₂](μ)

Definition: Minus

Let Ω₁ and Ω₂ be multisets of solution mappings. We define:

Minus(Ω₁, Ω₂) = { μ | μ in Ω₁ . ∀ μ' in Ω₂, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint }

card[Minus(Ω₁, Ω₂)](μ) = card[Ω₁](μ)

The additional restriction on dom(μ) and dom(μ') is added because otherwise if there is a solution mapping in Ω₂ that has no variables in common with the solution mappings of Ω₁, then Minus(Ω₁, Ω₂) would be empty, regardless of the rest of Ω₂. The empty solution mapping is compatible with every other solution mapping so P MINUS {} would otherwise be empty for any pattern P.

Definition: Extend

Let μ be a solution mapping, Ω a multiset of solution mappings, var a variable and expr be an expression, then we define:

Extend(μ, var, expr) = μ ∪ { (var,value) | var not in dom(μ) and value = expr(μ) }

Extend(μ, var, expr) = μ if var not in dom(μ) and expr(μ) is an error

Extend is undefined when var in dom(μ).

Extend(Ω, var, expr) = { Extend(μ, var, expr) | μ in Ω }

Write [ x | C ] for a sequence of elements where C is a condition on x.

Write card[L](x) to be the cardinality of x in L.

Definition: ToList

Let Ω be a multiset of solution mappings. We define:

ToList(Ω) = a sequence of mappings μ in Ω in any order, with card[Ω](μ) occurrences of μ

card[ToList(Ω)](μ) = card[Ω](μ)

Definition: OrderBy

Let Ψ be a sequence of solution mappings. We define:

OrderBy(Ψ, condition) = [ μ | μ in Ψ and the sequence satisfies the ordering condition]

card[OrderBy(Ψ, condition)](μ) = card[Ψ](μ)

Definition: Project

Let Ψ be a sequence of solution mappings and PV a set of variables.

For mapping μ, write Proj(μ, PV) to be the restriction of μ to variables in PV.

Project(Ψ, PV) = [ Proj(Ψ[μ], PV) | μ in Ψ ]

card[Project(Ψ, PV)](μ) = card[Ψ](μ)

The order of Project(Ψ, PV) must preserve any ordering given by OrderBy.

Definition: Distinct

Let Ψ be a sequence of solution mappings. We define:

Distinct(Ψ) = [ μ | μ in Ψ ]

card[Distinct(Ψ)](μ) = 1

The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

Definition: Reduced

Let Ψ be a sequence of solution mappings. We define:

Reduced(Ψ) = [ μ | μ in Ψ ]

card[Reduced(Ψ)](μ) is between 1 and card[Ψ](μ)

The order of Reduced(Ψ) must preserve any ordering given by OrderBy.

The Reduced solution sequence modifier does not guarantee a defined cardinality.

Definition: Slice

Let Ψ be a sequence of solution mappings. We define:

Slice(Ψ, start, length)[i] = Ψ[start+i] for i = 0 to (length-1)

Definition: ToMultiSet

Let Ψ be a solution sequence. We define:

ToMultiSet(Ψ) = { μ | μ in Ψ }

card[ToMultiSet(Ψ)](μ) = card[Ψ](μ)

ListEval is a function which is used to evaluate a list of expressions against a solution and return a list of the resulting values.

Definition: ToMultiset

ToMultiset turns a sequence into a multiset with the same elements and cardinality as the sequence. The order of the sequence has no effect on the resulting multiset, and duplicates are preserved.

Definition: Exists

exists(pattern) is a function that returns true if the pattern evaluates to a non-empty solution sequence, given the current solution mapping and active graph at the time of evaluation; otherwise it returns false.

18.5.1 Aggregate Algebra

Group is a function which groups a solution sequence into multiple solutions, based on some attribute of the solutions.

Definition: Group

Group evaluates a list of expressions against a solution sequence, producing a set of partial functions from keys to solution sequences.

Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ) = ListEval(exprlist, μ') } | μ in Ω }

Definition: ListEval

ListEval((expr₁, ..., expr_n), μ) returns a list (e₁, ..., e_n), where e_i = expr_i(μ) or error.

ListEval retains errors resulting from the evaluation of the list elements.

Note that, although the result of a ListEval can be an error, and errors may be used to group, solutions containing error values are removed at projection time.

ListEval((unbound), μ) = (error), as the evaluation of an unbound expression is an error.

Aggregation, a function which calculates a scalar value as an output of the aggregate expression. It is used in the SELECT clause, the HAVING evaluation process, and in ORDER BY (where required). Aggregation calculates aggregated values over groups of solutions, using set functions.

Definition: Aggregation

Let exprlist be a list of expressions or *, func a set function, scalarvals a set of partial functions (possibly empty) passed from the aggregate in the query, and let { key₁→Ω₁, ..., key_m→Ω_m } be a multiset of partial functions from keys to solution sequences as produced by the grouping step.

Aggregation applies the set function func to the given multiset and produces a single value for each key and partition of solutions for that key.

Aggregation(exprlist, func, scalarvals, { key₁→Ω₁, ..., key_m→Ω_m } )
= { (key, F(Ω)) | key → Ω in { key₁→Ω₁, ..., key_m→Ω_m } }

where
  M(Ω) = { ListEval(exprlist, μ) | μ in Ω }
  F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT
  F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT

Special Case: when COUNT is used with the expression * the value of F will be the cardinality of the group solution sequence, card[Ω], or card[Distinct(Ω)] if the DISTINCT keyword is present.

scalarvals are used to pass values to the underlying set function, bypassing the mechanics of the grouping. For example, the aggregate expression GROUP_CONCAT(?x ; separator="|") has a scalarvals argument of { "separator" → "|" }.

All aggregates may have the DISTINCT keyword as the first token in their argument list. If this keyword is present then first argument to func is Distinct(M).

Example

Given a solution multiset (Ω) with the following values:

solution	?x	?y	?z
μ₁	1	2	3
μ₂	1	3	4
μ₃	2	5	6

And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY ?x.

We produce G = Group((?x), Ω) = { ( (1), { μ₁, μ₂ } ), ( (2), { μ₃ } ) }

And so Aggregation((?y, ?z), ex:agg, {}, G) =
{ ((1), eg:agg({(2, 3), (3, 4)}, {})), ((2), eg:agg({(5, 6)}, {})) }.

Definition: AggregateJoin

Let S₁, ..., S_n be a list of sets, where each set S_i contains key to (aggregated) value maps as produced by Aggregate.

Let K = { key | key in dom(S_j) for some 1 <= j <= n } be the set of keys, then
AggregateJoin(S₁, ..., S_n) = { agg₁→val₁, ..., agg_n→val_n | key in K and key→val_i in S_i for each 1 <= i <= n }

Flatten is a function which is used to collapse multisets of lists into a multiset, so for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.

Definition: Flatten

The Flatten(M) function takes a multiset of lists, M {(L₁, L₂, ...), ...}, and returns the multiset { x | L in M and x in L }.

18.5.1.1 Set Functions

The set functions which underlie SPARQL aggregates all have a common signature: SetFunc(M), or SetFunc(M, scalarvals) where M is a multiset of lists, and scalarvals is one or more scalar values that are passed to the set function indirectly via the ( ... ; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is supported by the built-in aggregates in SPARQL Query 1.1 is GROUP_CONCAT, as in GROUP_CONCAT(?x ; separator=", ").

Note that the name "Set Function" is somewhat historical — the arguments to set functions are in fact multisets. The name is retained due to the commonality with SQL Set Functions, which also operate over multisets.

The set functions defined in this document are Count, Sum, Min, Max, Avg, GroupConcat, and Sample — corresponding to the aggregates COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, and SAMPLE. Definitions may be found in the following sections. Systems may choose to expand this set using local extensions, using the same notation as for functions and casts. Note that, unless the ; separator is used this requires the parser to know whether some IRI refers to a function, cast, or aggregate before it can determine if there are any errors in a query where aggregates are used.

18.5.1.2 Count

Count is a SPARQL set function which counts the number of times a given expression has a bound, and non-error value within the aggregate group.

Definition: Count

xsd:integer Count(multiset M)

N = Flatten(M)

remove error elements from N

Count(M) = card[N]

18.5.1.3 Sum

Sum is a SPARQL set function that will return the numeric value obtained by summing the values within the aggregate group. Type promotion happens as per the op:numeric-add function, applied transitively, (see definition below) so the value of SUM(?x), in an aggregate group where ?x has values 1 (integer), 2.0e0 (float), and 3.0 (decimal) will be 6.0 (float).

Definition: Sum

numeric Sum(multiset M)

The Sum set function is used by the SUM aggregate in the syntax.

Sum(M) = Sum(ToList(Flatten(M))).

Sum(S) = op:numeric-add(S₁, Sum(S_2..n)) when card[S] > 1
Sum(S) = op:numeric-add(S₁, 0) when card[S] = 1
Sum(S) = "0"^^xsd:integer when card[S] = 0

In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2, op:numeric-add(3, 0))).

18.5.1.4 Avg

The Avg set function calculates the average value for an expression over a group. It is defined in terms of Sum and Count.

Definition: Avg

numeric Avg(multiset M)

Avg(M) = "0"^^xsd:integer, where Count(M) = 0

Avg(M) = Sum(M) / Count(M), where Count(M) > 0

For example, Avg({1, 2, 3}) = Sum({1, 2, 3})/Count({1, 2, 3}) = 6/3 = 2.

18.5.1.5 Min

Min is a SPARQL set functions that returns the minimum value from a group respectively.

It makes use of the SPARQL ORDER BY ordering definition, to allow ordering over arbitrarily typed expressions.

Definition: Min

term Min(multiset M)

Min(M) = Min(ToList(Flatten(M)))

Min({}) = error.

The flattened multiset of values passed as an argument is converted to a sequence S, this sequence is ordered as per the ORDER BY ASC clause.

Min(S) = S₀

18.5.1.6 Max

Max is a SPARQL set function that return the maximum value from a group respectively.

It makes use of the SPARQL ORDER BY ordering definition, to allow ordering over arbitrarily typed expressions.

Definition: Max

term Max(multiset M)

Max(M) = Max(ToList(Flatten(M)))

Max({}) = error.

The multiset of values passed as an argument is converted to a sequence S, this sequence is ordered as per the ORDER BY DESC clause.

Max(S) = S₀

18.5.1.7 GroupConcat

GroupConcat is a set function which performs a string concatenation across the values of an expression with a group. The order of the strings is not specified. The separator character used in the concatenation may be given with the scalar argument SEPARATOR.

Definition: GroupConcat

literal GroupConcat(multiset M)

If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to be the "space" character, unicode codepoint U+0020.

The multiset of values, M passed as an argument is converted to a sequence S.

GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))

GroupConcat(S, sep) = "", where |S| = 0

GroupConcat(S, sep) = CONCAT("", S₀), where |S| = 1

GroupConcat(S, sep) = CONCAT(S₀, sep, GroupConcat(S_1..n-1, sep)), where |S| > 1

For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".

18.5.1.8 Sample

Sample is a set function which returns an arbitrary value from the multiset passed to it.

Definition: Sample

RDFTerm Sample(multiset M)

Sample(M) = v, where v in Flatten(M)

Sample({}) = error

For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return values. Note that Sample() is not required to be deterministic for a given input, the only restriction is that the output value must be present in the input multiset.

18.6 Evaluation Semantics

We define eval(D(G), algebra expression) as the evaluation of an algebra expression with respect to a dataset D having active graph G. The active graph is initially the default graph.

D : a dataset
D(G) : D a dataset with active graph G (the one patterns match against)
D[i] : The graph with IRI i in dataset D
P, P1, P2 : graph patterns
L : a solution sequence
F : an expression

'substitute' is a filter function in support of the evaluation of EXISTS and NOT EXISTS forms which were translated to exists.

Definition: Substitute

Let μ be a solution mapping.

substitute(pattern, μ) = the pattern formed by replacing every occurrence of a variable v in pattern by μ(v) for each v in dom(μ)

Definition: Evaluation of Exists

Let μ be the current solution mapping for a filter and P a graph pattern:

The value exists(P), given D(G) is true if and only if eval(D(G), substitute(P, μ)) is a non-empty sequence.

Definition: Evaluation of Join

eval(D(G), Join(P1, P2)) = Join(eval(D(G), P1), eval(D(G), P2))

Definition: Evaluation of LeftJoin

eval(D(G), LeftJoin(P1, P2, F)) = LeftJoin(eval(D(G), P1), eval(D(G), P2), F)

Definition: Evaluation of Union

eval(D(G), Union(P1,P2)) = Union(eval(D(G), P1), eval(D(G), P2))

Definition: Evaluation of Graph

if IRI is a graph name in D
eval(D(G), Graph(IRI,P)) = eval(D(D[IRI]), P)

if IRI is not a graph name in D
eval(D(G), Graph(IRI,P)) = the empty multiset

eval(D(G), Graph(var,P)) =
     Let R be the empty multiset
     foreach IRI i in D
        R := Union(R, Join( eval(D(D[i]), P) , Ω(?var->i) )
     the result is R

The evaluation of graph uses the SPARQL algebra union operator. The cardinality of a solution mapping is the sum of the cardinalities of that solution mapping in each join operation.

Note that if eval(D(G), A_i) is an error, it is ignored.

Definition: Evaluation of Slice

eval(D(G), Slice(L, start, length)) = Slice(eval(D(G), L), start, length)

18.7 Extending SPARQL Basic Graph Matching

The overall SPARQL design can be used for queries which assume a more elaborate form of entailment than simple entailment, by re-writing the matching conditions for basic graph patterns. Since it is an open research problem to state such conditions in a single general form which applies to all forms of entailment and optimally eliminates needless or inappropriate redundancy, this document only gives necessary conditions which any such solution should satisfy. These will need to be extended to full definitions for each particular case.

Basic graph patterns stand in the same relation to triple patterns that RDF graphs do to RDF triples, and much of the same terminology can be applied to them. In particular, two basic graph patterns are said to be equivalent if there is a bijection M between the terms of the triple patterns that maps blank nodes to blank nodes and maps variables, literals and IRIs to themselves, such that a triple ( s, p, o ) is in the first pattern if and only if the triple ( M(s), M(p), M(o) ) is in the second. This definition extends that for RDF graph equivalence to basic graph patterns by preserving variable names across equivalent patterns.

An entailment regime specifies

a subset of RDF graphs called well-formed for the regime
an entailment relation between subsets of well-formed graphs and well-formed graphs.

Detailed definitions for querying various entailment regimes can be found in SPARQL 1.1 Entailment Regimes.

Some entailment regimes can categorize some RDF graphs as inconsistent. For example, the RDF graph:

_:x rdf:type xsd:string .
_:x rdf:type xsd:decimal .

is D-inconsistent when D contains the XSD datatypes. The effect of a query on an inconsistent graph is not covered by this specification, but must be specified by the particular SPARQL extension.

An entailment regime E must provide conditions on basic graph pattern evaluation such that for any basic graph pattern BGP, any RDF graph G, and any evaluation that satisfies the conditions, the resulting multiset of solutions is uniquely determined up to RDF graph equivalence. We denote the multiset of solutions from evaluating BGP over G using E with Eval-E(G, BGP).
An entailment regime must further satisfy the following conditions:

For any E-consistent active graph AG, the entailment regime E uniquely specifies a scoping graph SG that is E-equivalent to AG.
A set of well-formed graphs for E is specified such that, for any basic graph pattern BGP, scoping graph SG, and solution mapping μ in Eval-E(SG, BGP), the graph μ(BGP) is well-formed for E.
For any basic graph pattern BGP and scoping graph SG, if μ₁, ..., μ_n in Eval-E(SG, BGP) and BGP₁, ..., BGP_n are basic graph patterns all equivalent to BGP but not sharing any blank nodes with each other or with SG, then

SG E-entails (SG union μ₁(BGP₁) union ... union μ_n(BGP_n))

These conditions do not fully determine the set of possible answers, since RDF allows unlimited amounts of redundancy. In addition, therefore, the following must hold.
Entailment regimes should provide conditions to prevent trivial infinite solution multisets as appropriate to the regime.

18.7.1 Notes

(a) SG will often be graph equivalent to AG, but restricting this to E-equivalence allows some forms of normalization, for example elimination of semantic redundancies, to be applied to the source documents before querying.

(b) The construction in condition 3 ensures that any blank nodes introduced by the solution mapping are used in a way which is internally consistent with the way that blank nodes occur in SG. This ensures that blank node identifiers occur in more than one answer in an answer set only when the blank nodes so identified are indeed identical in SG. If the extension does not allow bindings to blank nodes, then this condition can be simplified to the condition:

SG E-entails μ(BGP) for each solution mapping μ.

(c) These conditions do not impose the SPARQL requirement that SG shares no blank nodes with AG or BGP. In particular, it allows SG to actually be AG. This allows query protocols in which blank node identifiers retain their meaning between the query and the source document, or across multiple queries. Such protocols are not supported by the current SPARQL protocol specification, however.

(d) Since conditions 1 to 3 are only necessary conditions on answers, condition 4 allows cases where the set of legal answers can be restricted in various ways.

(e) None of these conditions refer explicitly to instance mappings on blank nodes in BGP. For some entailment regimes, the existential interpretation of blank nodes cannot be fully captured by the existence of a single instance mapping. These conditions allow such regimes to give blank nodes in query patterns a 'fully existential' reading.

It is straightforward to show that SPARQL satisfies these conditions for the case where E is simple entailment, given that the SPARQL condition on SG is that it is graph-equivalent to AG but shares no blank nodes with AG or BGP (which satisfies the first condition). The only condition which is nontrivial is (3).

For every solution mapping μ_i, there is, by definition of basic graph pattern matching, an RDF instance mapping σ_i such that P_i(BGP_i) is a subgraph of SG where P_i is the pattern instance mapping composed of μ_i and σ_i. Since BGP_i and SG have no blank nodes in common, the ranges of σ_i and μ_i contain no blank nodes from BGP_i; therefore, the solution mapping μ_i and the RDF instance mapping σ_i of P_i commute, so P_i(BGP_i) = σ_i(μ_i(BGP_i)). So

P₁(BGP₁) union ... union P_n(BGP_n)
= σ₁(μ₁(BGP₁)) union ... union σ_n(μ_n(BGP_n))
= [ σ₁ + ... + σ_n]( μ₁(BGP₁) union ... union μ_n(BGP_n) )

since the domains of the σ_i RDF instance mappings are all mutually exclusive. Since they are also exclusive from SG,

SG union [ σ₁ + ... + σ_n]( μ₁(BGP₁) union ... union μ_n(BGP_n) )
= [ σ₁ + ... + σ_n](SG union μ₁(BGP₁) union ... union μ_n(BGP_n) )

i.e.

SG union μ₁(BGP₁) union ... union μ_n(BGP_n)

has an instance which is a subgraph of SG, so is simply entailed by SG by the RDF interpolation lemma [RDF-MT].

SPARQL 1.1 Query Language

18.2 Translation to the SPARQL Algebra

18.2.1 Variable Scope

18.2.2 Converting Graph Patterns

18.2.2.1 Expand Syntax Forms

18.2.2.2 Collect FILTER Elements

18.2.2.3 Translate Property Path Expressions

18.2.2.4 Translate Property Path Patterns

18.2.2.5 Translate Basic Graph Patterns

18.2.2.6 Translate Graph Patterns

18.2.2.7 Filters of Group

18.2.2.8 Simplification step

18.2.3 Examples of Mapped Graph Patterns

18.2.4 Converting Groups, Aggregates, HAVING, final VALUES clause and SELECT Expressions

18.2.4.1 Grouping and Aggregation

18.2.4.2 HAVING

18.2.4.3 VALUES

18.2.4.4 SELECT Expressions

18.2.5 Converting Solution Modifiers

18.2.5.1 ORDER BY

18.2.5.2 Projection

18.2.5.3 DISTINCT

18.2.5.4 REDUCED

18.2.5.5 OFFSET and LIMIT

18.2.5.6 Final Algebra Expression

18.3 Basic Graph Patterns

18.3.1 SPARQL Basic Graph Pattern Matching

18.3.2 Treatment of Blank Nodes

18.4 Property Path Patterns

18.5 SPARQL Algebra

18.5.1 Aggregate Algebra

18.5.1.1 Set Functions

18.5.1.2 Count

18.5.1.3 Sum

18.5.1.4 Avg

18.5.1.5 Min

18.5.1.6 Max

18.5.1.7 GroupConcat

18.5.1.8 Sample

18.6 Evaluation Semantics

18.7 Extending SPARQL Basic Graph Matching

18.7.1 Notes

18.2.2.2 Collect `FILTER` Elements