A Brief Introduction to XPath 2.0

The Presto Server uses XPath 2.0 expressions to work with the results of component mashable information sources and all the variables used for a mashup script. This topic introduces the simplest XPath capabilities that you can use in a mashup script. It uses the shortcut XPath syntax.

XPath receives an XML document from a parser as a tree of nodes. XPath 2.0 nodes are the:

The most basic expression is a path that traverses the document nodes to a specific node or a set of nodes called a sequence or node set. For example:

This path points to a <paragraph> element inside <section> inside <chapter> inside <document> which is the root node for the XML document. The slashes ( /) separate each step in the path from parent to child.

There are several important points to note about this example:

Traversal Order: the path traverses the XML document in document order because this is the default direction for paths. XPath expressions can traverse the document in reverse order or skip levels as needed.

Absolute Path: this example is an absolute path because it starts with a slash (/). This initial slash is the root node in the expression - a starting point just before the document node that encloses all of the document.

If you omit the initial slash, the path is relative to the current position in the document - called the current context. Most mashup expressions use absolute paths or they are relative to a context that is defined in the mashup script.

Hierarchy: this example is explicit about the whole hierarchy to traverse. It would not, for example, match any <paragraph> nodes inside <section> that was inside another <section>.

You can use a double slash (//) to indicate that the expression can skip any number of levels within the tree. You can use this at the beginning of absolute paths or at any step within a path. The two examples shown below would find any <paragraph> node at any level:

Cardinality: this example more than likely results in a set of nodes rather than one specific <paragraph> element. The path matches any <paragraph> in any <section> in any <chapter> in <document>. If no <chapter> contains a <section>, this path would result in an empty sequence with no nodes.

Elements Only: this path matches <paragraph> elements only. Any attributes or descendants of <paragraph> are included, but no other types of nodes are selected.

To match attributes, add a step in the path after the element the attribute belongs to and use "@" in front of the attribute name. For example:

This matches the id attribute on any <section> element in any <chapter> in <document>.

Namespaces appear as prefix: before a node name in some XML documents. For example: xs:para has a namespace xs. Namespaces can be quite confusing, as the prefix is simply a shorthand for the actual namespace that is identified with an xmlns attribute. Frequently, the full namespace is declared on an ancestor, such as xmlns:xs="http://myNamespace".

Nodes with namespaces belong to a specific category. Nodes with the same name but a different namespace - or no namespace - are not the same type of nodes. The namespace modifies the node name into a different group.

The two path expressions shown below select different title nodes:

The first path selects any section title node in the namespace associated with the a prefix. The second path selects any section title node that has no namespace.

You can use wildcards in XPath expressions to ignore namespaces or to ignore node names:

*:node-name in an XPath expression selects all nodes with the matching name, regardless of what namespace they belong to, including having no namespace.

namespace:* in an XPath expression selects all nodes belonging to a specific namespace, regardless of the node name.

You can use predicates in XPath expressions to make the match more specific or to filter out nodes. Predicates appear within the steps of a path and inside brackets, such as this example:

The predicate [@id='intro'] makes this expression match only <section> element with an id attribute value of intro. You can use predicates to test many types of conditions:

Existence: /document/chapter/section[@id] matches any <section> that has an id attribute.

Position: /document/chapter/section[3] matches the third <section> of any <chapter>. /document/chapter/section[last()] matches the last <section> of any <chapter>.

Combining Criteria: you can use the keywords and or or to combine the criteria in a predicate. You can also specify multiple predicates. They are evaluated in order.

The example shown below selects all <section> nodes that belong to a <chapter> with a <role> child and a position of 3 or greater in <document>.

Arithmetic and Logical Operators: predicates can use common logical operators such as = or >. You can also perform basic arithmetic such as ( 2 + 3) in predicates.

Note:

You may need to use XML escaping with the arithmetic opertors that use the < or > characters. See XML Escaping in URLs and Expressions for information.

Comparison Functions: predicates can also contain XPath functions that affect the comparison or define a specific relationship. Common examples include:

not() to negate the expression. This example selects any <section> child of <chapter> that does not have an id attribute set:

contains() to determine string inclusion. This uses case-sensitive comparisons. This example selects any <item> node that has a <category> child whose value contains ASST somewhere in the string:

matches() which also determines string inclusion using regular expressions. You can also do case-insensitive comparisons. This is a case-insensitive comparison, for example:

You can refer to variables in XPath expressions using $variable-name. Variables can also occur at the beginning of paths, in calculations, or in predicates.

Simple arithmetic expressions are also valid XPath expressions. You can use any of the following expressions as XPath:

1 + 1
/order/item/price - 1.00
/order/subtotal * /order/taxrate
($total div 100) + 10
$total idiv 100
/order/qty mod 10

XPath uses div for decimal division and idiv for integer division. Similarly, mod is the modulo operator. You can also use parentheses to group arithmetic expressions and control precedence.

As these examples show, you can combine paths and arithmetic in expressions to perform calculations. The value of the path must result in a numeric value, or a string value that lexically represents a number, for the calculation to be successful. If the path cannot be resolved to a number, or the calculation is invalid (such as division by zero), the result is NaN (not a number).

You can also use any XPath 2.0 function in any step of a path or expression. Functions allow you to express things such a last() or not(). The following example selects any <chapter> nodes that do not contain a <role> child:

In addition, there are many string, numeric and date functions that let you transform the text in elements or attributes. This example changes all the text of the first <paragraph> nodes to upper-case:

You can also use XPath functions to cast node data to other data types. This example casts the value of the first <highQuote> node to decimal.

XPath functions also let your retrieve the current date or date and time or perform date calculations. This example calculates the number of days between a <comment> posted date and the current date.

This introduction touches only the most basic and simple aspects of XPath 2.0. For more information about syntax and the available functions, see the XPath 2.0, XPath 2.0 functions and the XPath 2.0 Data Model specifications.