The Nuts and Bolts of XQuery

In this document you will learn about the nuts and bolts of Tamino XQuery. It will pave the way for a solid understanding of the whole language.


Expressions and Sequences

In XQuery, you use expressions. Expressions can be of different kinds, some of which can be nested in a general way. Each XQuery operator and function expects its operands to be of a certain type. This makes XQuery a functional, strongly-typed language.

Every expression evaluates to a sequence, which is an ordered collection of items. An item is either an atomic value or a node. An atomic value does not contain any other value and is either a primitive data type or a derived data type as defined in XML Schema. A node is one of the seven kinds element, attribute, namespace, text, comment, processing instruction or document node. It has an identity, because its creation is independent of its value.

A sequence can be empty, consist of only a single item (singleton sequence) or more items. Sequences have the following properties:

  • Sequences are ordered.

    (input()/bib/book/author/first, input()/bib/book/author/last)

    Even if last elements appear before first elements in the document, in this sequence the order is as follows: first first elements, then last elements. The comma serves as concatenation operator on sequences.

    Note:
    In XPath 1.0, sets and node sets were always kept in forward or reverse document order, depending on the axis.

  • Sequences are always flat.

    (1, 2, ("a", "b", "c"), 3, 4)
    ((1, (2)), (("a", "b", "c")), (3, 4))

    Although you can use nested sequence constructors, the result is always a "flattened" sequence. Any nested sequence items will be arranged in the same order, as if there were no nestings at all. So, both example sequences are equivalent to:

    (1, 2, "a", "b", "c", 3, 4)
  • Sequences may contain duplicates.

    (input()/bib/book/author/first, input()/bib/book/author/last, input()/bib/book/author/first)
    (1, 2, 3, 4, 3, 2, 1)

    Now that there is an order on a sequence, sequence items may occur more than once in a sequence. These duplicates can have the same value or the same node identity.

    Note:
    In XPath 1.0, a node could only appear once in a node set.

Remember that every expression in XQuery evaluates to a sequence. Even if we have an XQuery expression such as

let    $x := 5
return $x * 30

that defines a local variable $x and returns its value multiplied by 30, the XQuery expression, strictly speaking, returns a sequence with the single integer value 150.

In contrast to the let variable the type of the sequence for other expressions is constrained to be a special sequence. For example, a for variable is always an item (identical to a singleton sequence):

for    $bib in input()/bib
return $bib

Note:
In XQuery, all keywords are written in lower case. It results in a parsing error if you use mixed or upper case.

Retrieving Data

In Tamino XQuery, there are two functions that provide access to data stored in a Tamino database. The function input() takes no parameters and is an implementation-defined method to assign nodes from a source to the input sequence which is evaluated in a query expression. In Tamino, it is always the current collection of a Tamino database that input() provides access to. The input sequence then consists of all document nodes of the current collection. Similarly, you can use the function collection() to access nodes from a collection that may be different from the default collection. The collection is specified as parameter.

input()
collection("XMP")
input()/bib/book/title
collection("XMP")/bib/book/title

The first input() expression returns the document instances of all doctypes in the current collection. The second input() expression returns a sequence of all title elements that are child nodes of book elements that are child nodes of the bib document element. The collection() expressions on the right side correspond to the input() expressions on the left side, provided that the current collection for the input() expressions is XMP.

In XPath 1.0, any expression locates nodes in a single document. However, in XQuery as well as in the previous X-Query language, expressions are evaluated with regard to a collection of documents. More precisely, the input for an expression is a sequence of document nodes in a collection.

Constructors

In XQuery, you can conveniently compose your query result using constructors for new elements and attributes. With constructors, you can construct new element and attribute nodes within a query expression:

let $a := input()/bib/book/author
return
<index type="author">
  { $a/last }
  { $a/first }
</index>

This XQuery expression compiles a name index from all authors of the book doctype in the current collection. It constructs an element index with an attribute type indicating the type of index. The index contains two expressions enclosed in braces. They evaluate to element nodes last and first from all author elements.

It is sufficient to literally write the start and end tags of an element to construct it. Whenever you need to evaluate some expression, you have to enclose it in braces.

Path Expressions

XQuery uses path expressions to locate nodes in a document tree in much the same way as XPath 1.0 defined it originally:

let $b := input()/bib/book/author
return $b/last

input()/patient//type

The first expression returns the last child element nodes of all author elements. The second expressions returns all type elements that are descendant nodes of the patient element. Here, // is the abbreviated syntax for /descendant-or-self::node()/.

The structure of a path expression has only slightly changed with regard to XPath 1.0: A path expression consists of a sequence of steps which can be distinguished into general steps and location steps. A general step is an expression that evaluates to a node sequence, e.g. the input() function that delivers the document nodes of the current collection. It can only be the first step in a path expression. A location step consists of three parts:

  • An axis, which specifies the relationship between the set of selected nodes and the context node,

  • A node test, which specifies type and/or name of the set of selected nodes, and

  • Zero or more predicates, which further restrict the set of selected nodes.

Axes

XQuery supports a number of axes. An axis originates in the context node and determines the initial node sequence that is further refined by node tests and predicates. In XQuery and XPath 2.0, you can specify a path in either unabbreviated or abbreviated syntax. The following table lists each axis along with its direction (normal document order or reverse document order) and a short description. In the unabbreviated syntax, a double colon '::' follows the name of the axis.

Axis Direction Meaning
ancestor:: reverse all ancestor nodes (parent, grandparent, great-grandparent, etc.)
attribute:: implementation-defined attached attribute nodes
child:: normal immediate child nodes (default axis)
descendant:: normal all descendant child nodes
descendant-or-self:: normal current node and all its descendant child nodes
parent:: reverse parent node (or attaching node for attribute and namespace nodes)
self:: normal the current node

Tamino also supports the abbreviated notation of path expressions with axes. The following table shows how they correspond to the unabbreviated axes (as defined in the W3C XQuery specification):

Abbreviation Description
no axis nodes along the child:: axis satisfying node tests and optional predicates
@ nodes along the attribute:: axis satisfying node tests and optional predicates
. self::node(), which is the current node of any type
.. parent::node(), which is the empty sequence if the current node is the document node; the attaching node if the current node is an attached node (of type attribute or namespace); otherwise the parent node
// /descendant-or-self::node()/, which is the absolute path at the start of an expression, or the relative path elsewhere

The following query expressions are thus equivalent:

1.
for $a in input()/bib/book
return $a/title
for $a in input()/bib/book
return $a/child::title
2.
for $a in input()/bib/book
return $a/@*
for $a in input()/bib/book
return $a/attribute::*

Node Tests

The node test determines the type and optionally the name of the nodes along the axis direction. For each axis, there is a principal node type: for the attribute axis, it is attribute; for other axes, it is element. You can select a node by applying one of the following node tests. The node is selected if the test evaluates to "true".

NodeTest Description
processing-instruction() a processing instruction node (regardless of name)
processing-instruction('Literal') a processing instruction node with name Literal; if name is omitted, then the test is "true" for any processing instruction node
comment() a comment node
text() a text node
node() a node of any type (regardless of name)
'Name' a node of the principal node type with the specified name
'prefix:name' according to the axis used: an element or attribute node in the specified namespace with the specified local name
'prefix:*' according to the axis used: all element or attribute nodes in the specified namespace
'*:name' according to the axis used: all element or attribute nodes in the specified namespace with the specified local name (regardless of namespace)
'*' according to the axis used: all element or attribute nodes

Predicates

The last, optional part of a step is one or more predicates to filter the sequence of selected nodes according to the predicate expression. This expression is always enclosed in square brackets [ and ]. A selected node is retained if the predicate truth value of the predicate expression evaluates to "true".

The predicate truth value is derived by applying the following rules, in order:

  1. If the value of the predicate expression is an atomic value of a numeric type, the predicate truth value is true if the value of the predicate expression is equal to the context position, and is false otherwise.

  2. Otherwise, the predicate truth value is the effective boolean value of the predicate expression.

The effective boolean value of an expression is false if its operand is any of the following:

  • An empty sequence

  • The boolean value false

  • A zero-length value of type xs:string or xdt:untypedAtomic

  • numeric value that is equal to zero

Otherwise, fn:boolean returns "true".

The filtered node sequence is ordered according to the direction of the selected axis.

Data Types

The XQuery type system is much richer than that of XPath 1.0. It uses the built-in data types as defined in XML Schema 1.0. The set of built-in data types consist of primitive types and derived types. They fall roughly into these categories:

  • Boolean values (true and false)

  • Numbers: decimals, floating-point numbers with single and double precision

  • Character Strings

  • Data types for dates, times, and durations (two of which are not yet defined in XML Schema)

  • XML-specific data types such as QName and NOTATION

In addition, there are derived types that are derived from the primitive types. In the XML schema documentation you will find a diagram that summarizes the primitive and derived types, which are all supported by Tamino XQuery.

Expressions and functions expect operands and parameters to be of a certain type. If the required type cannot be provided, type conversion is attempted. The following general methods can be applied:

Atomization

Atomization takes place when an atomic value or a sequence of atomic values are expected. When atomizing a given value, the following cases can be distinguished: If the value is an atomic value or the empty sequence, then that value is returned. If the value is a single node, then the typed value of that node is returned. Otherwise an error is raised.

Atomization is used when processing arithmetic expressions, comparison expressions, function calls and sort expressions.

Type Promotion

During processing of arithmetic expressions and value comparisons, an atomic value can be promoted from one type to another. As a general rule the value of a derived type can be promoted to its base type. The value of the base type is the same as that of the original type. For example, a value of type xs:long can be promoted to its base type xs:decimal retaining its original value. Two further promotions between base types are possible: a value of type xs:decimal can be promoted to xs:float, the value being as close as possible to the original value. And a value of type xs:float can be promoted to xs:double also retaining its original value.

Functions

A number of functions that operate on different types of data and perform various tasks are defined. Most of them are defined in the W3C specification XQuery 1.0 and XPath 2.0 Functions and Operators.

let $a := input()/bib/book
return
<p>Currently, there are { count($a) } books stored.</p>

In addition, Tamino XQuery provides further functions that perform full-text operations or deal with special aspects of documents stored in Tamino. These functions use the namespace http://namespaces.softwareag.com/tamino/TaminoFunction, usually prefixed by tf. They do not belong to the standard namespace http://www/w3.org/2002/08/xquery-functions, which is prefixed by fn. Since tf is a predefined namespace prefix, you do not have to qualify them with their namespace nor declare the namespace.

for    $t in input()/bib/book
where  tf:containsText($t/title, "UNIX")
return $t
for    $a in input()/bib/book
where  $a/title = "TCP/IP Illustrated"
return tf:getCollection($a)

The first query uses a text retrieval function to look for all books that contain the word "UNIX" in their title. The second query uses a comparison expression to look for all books whose title is equal to the string "TCP/IP Illustrated".