In addition to the well-known standard and text indexes, Tamino offers the following advanced indexes:
unique keys
multipath indexes
computed indexes
compound indexes
reference indexes
The impact on performance of these indexes is discussed in this document. You should be familiar with the syntax and concepts of these indexes as described in the Tamino XML Schema Reference Guide and the Tamino XML Schema User Guide. The information is organized under the following topics:
The purpose of indexes is to improve query performance. However, this is done at the disadvantage of a higher disk space consumption and a higher effort when documents are inserted, modified or deleted. Thus it should be thoroughly considered if it is really necessary to create an index (which means there are enough queries that can benefit from the index) and whether the disadvantages can be tolerated.
From a logical point of view, a unique key is just an assertion: Tamino guarantees that each value of a unique key appears only once within the doctype. Internally, Tamino uses an index for each unique key in order to easily keep track of the already existing values. In addition to their main task of duplicate detection, these indexes are also used during query evaluation. Hence, unique keys can improve query performance.
If a unique key is defined with one component field, a standard index will be created for that field. If there are several fields, a compound index will be created at the root element. The example below shows a schema with unique key definitions, followed by a schema that shows the indexes created by Tamino (note that the original schema is not modified by Tamino, the second schema is just shown to illustrate the index creation). Hence, from a performance point of view, a unique key behaves either like a standard index or like a compound index.
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "unique"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "A"> <tsd:logical> <tsd:content>closed</tsd:content> <tsd:unique name = "simple-key"> <tsd:field xpath = "D"></tsd:field> </tsd:unique> <tsd:unique name = "compound-key"> <tsd:field xpath = "B/@b"></tsd:field> <tsd:field xpath = "C"></tsd:field> </tsd:unique> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "A"> <xs:complexType> <xs:sequence> <xs:element name = "B"> <xs:complexType> <xs:simpleContent> <xs:extension> <xs:attribute name = "b" type = "xs:string" use = "required"> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name = "C" type = "xs:string"></xs:element> <xs:element name = "D" type = "xs:string"></xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "unique"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "A"> <tsd:logical> <tsd:content>closed</tsd:content> <tsd:unique name = "simple-key"> <tsd:field xpath = "D"></tsd:field> </tsd:unique> <tsd:unique name = "compound-key"> <tsd:field xpath = "B/@b"></tsd:field> <tsd:field xpath = "C"></tsd:field> </tsd:unique> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "A"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:standard> <tsd:field xpath = "B/@b"></tsd:field> <tsd:field xpath = "C"></tsd:field> </tsd:standard> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name = "B"> <xs:complexType> <xs:simpleContent> <xs:extension base = "xs:string"> <xs:attribute name = "b" type = "xs:string" use = "required"> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name = "C" type = "xs:string"></xs:element> <xs:element name = "D" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:standard></tsd:standard> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Note:
Although Tamino automatically creates indexes in order to implement
unique key constraints, it is recommended to explicitly define the
corresponding index in the schema if you rely on the performance improvement.
Tamino will detect when an index definition matches a unique key constraint,
and only one index will be created. The benefit is that such an explicitly
defined index will survive if the unique key constraint is modified or
removed.
A multipath index is an index that covers several paths: if each of those paths had its own index, the corresponding multipath index can be seen as the union of those indexes. As a feature, multipath is an add-on option for other indexes. It can be used with standard, compound, and text indexes. See the respective section in the Tamino XML Schema Reference Guide for detailed rules about creating a multipath index.
The multipath feature supports queries in the following scenarios:
Highly-connected structures: Global elements or attributes with index are referenced from many places in the schema (which might become a problem as the number of distinct indexes is limited).
Recursive structures: Each occurrence of an element or attribute in a recursive structure is to be indexed.
Arbitrary path sets: Arbitrary path sets can be combined into one multipath index, if the rules apply (paths have to have the same type of index and the same data types).
The following examples illustrate these scenarios.
This example schema has several types of chapters, each of which has a
title which is defined in a global element. The title has a text index, and
instead of defining a separate index for each possible path, one common
multipath index is defined which is used for any possible path to the
Title
element.
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "highly-connected"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "Document"> <tsd:logical> <tsd:content>closed</tsd:content> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "Title" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:text> <tsd:multiPath>allTitlesIndex</tsd:multiPath> </tsd:text> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> <xs:element name = "Document"> <xs:complexType> <xs:sequence> <xs:element name = "Chapter1"> <xs:complexType> <xs:sequence> <xs:element ref = "Title"></xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name = "Chapter2"> <xs:complexType> <xs:sequence> <xs:element ref = "Title"></xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name = "Chapter3"> <xs:complexType> <xs:sequence> <xs:element ref = "Title"></xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
This multipath index is used in an optimal way by queries like the following (the first example query uses XQuery syntax, followed by the same example in X-Query syntax respectively):
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d//Title, "some text") return $d
_XQL = /Document[.//Title ~= "some text"]
It finds all documents where an arbitrary title, regardless of its path, fulfils the search criterion. The result is found by performing one index lookup. Without the multipath index, there has to be a separate index for each path, and the result of several index lookups had to be combined by an OR operation.
The next example evaluates the criterion against one particular path:
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d/Chapter1/Title, "some text") return $d
_XQL = /Document[Chapter1/Title ~= "some text"]
This query also makes use of the multipath index. But as the index has
no knowledge about the path in which a particular value occurs, the index can
only deliver a superset of the real result. From the viewpoint of the index,
the criterion could be fulfilled by Chapter1
or
Chapter2
or Chapter3
. This superset has to be
filtered by post-processing.
This example schema defines a chapter that has a title, and that
contains a nested chapter. The title has a text index. Without a multipath
index, there is no chance to index every possible nesting level. Using
tsd:which
, only a finite number of nesting levels can be
explicitly indexed.
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "recursive"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "Document"> <tsd:logical> <tsd:content>closed</tsd:content> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "Document"> <xs:complexType> <xs:sequence> <xs:element ref = "Chapter"></xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name = "Chapter"> <xs:complexType> <xs:sequence> <xs:element name = "Title" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:text> <tsd:multiPath>nestedTitlesIndex</tsd:multiPath> </tsd:text> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> <xs:element ref = "Chapter" minOccurs = "0"></xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The queries supported by this multipath index are very similar to the highly-connected scenario.
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d//Title, "some text") return $d
_XQL = /Document[.//Title ~= "some text"]
This query finds all documents where an arbitrary title, regardless of
its nesting level, fulfils the search criterion. The result is found by
performing one index lookup. Without the multipath feature, this query can only
be supported by indexes if every actually occurring nesting level of
Title
is explicitly indexed by a tsd:which
statement.
The next example evaluates the criterion against one particular nesting level:
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d/Chapter/Chapter/Title, "some text") return $d
_XQL = /Document[Chapter/Chapter/Title ~= "some text"]
This query also makes use of the multipath index. But as the index has
no knowledge about the nesting level at which a particular value occurs, the
index can only deliver a superset of the real result: from the viewpoint of the
index, the criterion could be fulfilled by Chapter/Title
or
Chapter/Chapter/Title
, and so on. This superset has to be filtered
by post-processing.
The previous examples are based on the use of global elements (which is of course mandatory for recursion). The multipath feature, however, is not restricted to global elements. The following example shows a document that has an introduction with a subtitle, and two chapters with a title (where each title is modeled locally under its parent). Each of these three title definitions has its own multipath definition. As these definitions specify the same multipath label, the schema actually defines one multipath index, with three participating paths.
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "path-set"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "Document"> <tsd:logical> <tsd:content>closed</tsd:content> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "Document"> <xs:complexType> <xs:sequence> <xs:element name = "Introduction"> <xs:complexType> <xs:sequence> <xs:element name = "Subtitle" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:text> <tsd:multiPath>allTitles</tsd:multiPath> </tsd:text> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name = "Chapter1"> <xs:complexType> <xs:sequence> <xs:element name = "Title" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:text> <tsd:multiPath>allTitles </tsd:multiPath> </tsd:text> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name = "Chapter2"> <xs:complexType> <xs:sequence> <xs:element name = "Title" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:text> <tsd:multiPath>allTitles </tsd:multiPath> </tsd:text> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Queries similar to the following examples make use of the multipath index:
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d/Introduction/Subtitle, "some text") or tf:containsText ($d//Title, "some other text") return $d
_XQL = /Document[Introduction/Subtitle ~= "some text" or .//Title ~= "some other text"]
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d/Introduction/Subtitle, "some text") return $d
_XQL = /Document[Introduction/Subtitle ~= "some text"]
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d/Chapter1/Title, "some text") return $d
_XQL = /Document[Chapter1/Title ~= "some text"]
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document where tf:containsText ($d/Chapter1/Title, "some text") and tf:containsText ($d/Chapter2/Title, "some other text") return $d
_XQL = /Document[Chapter1/Title ~= "some text" and Chapter2/Title ~= "some other text"]
In all these cases, post-processing is required to filter the result of the index scan. The reason is again that the index has no knowledge about the path in which a particular value occurs.
A computed index is even more powerful than a multipath indexes, with the current restriction that a computed index may be neither a text index nor a compound index. Instead of adding the index definition to all nodes (or paths) to be included in a multipath index, the computed index refers to an XQuery function which is defined in a module stored in Tamino via the QName of the XQuery function. This XQuery function may compute one or more index entries based on arbitrary nodes and their values in the XML document being stored in a doctype.
A computed index consists of:
an XQuery module defining the indexing function(s)
the schema defining the computed indexes referring to the indexing functions
an XQuery query taking advantage of the computed index by using the indexing function, for which the root node of each document will passed as an argument. The indexing function must be used either in a comparison or in an "order by" clause.
An indexing function must have the following signature:
Exactly one parameter of type "node()";
The return type is the QName of a known simple type; at the moment it must be a type predefined by XML Schema. Hence, a QName such as "xs:integer" might be specified, with an additional occurrence indicator such as "?" or "*". A return types such as "node()" or "item()" with an optional occurrence indicator is not acceptable.
The type
attribute of
tsd:computed
, which is typically the same as the
declared return type of the indexing function, must specify a simple type that
is predefined in XML Schema.
For examples and additional aspects, please refer to the following documentation sections:
XML Schema User Guide > Appendix 5: Example Schemas for Indexing
XQuery User Guide > Advanced Usage > Defining and Using Modules
X-Machine Programming > Maintaining Tamino Indexes
Machine Programming > Requests using X-Machine Commands > _admin
A compound index combines values from different component
fields into one index value. The following schema has a
Name
element with Firstname
,
Initial
, and Lastname
children. There is a compound
index located at the Name
element, having
Firstname
, Initial
, and Lastname
as
components (in that sequence).
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "compound"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "Document"> <tsd:logical> <tsd:content>closed</tsd:content> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "Document"> <xs:complexType> <xs:sequence> <xs:element name = "Name" maxOccurs = "unbounded"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:standard> <tsd:field xpath = "Firstname"></tsd:field> <tsd:field xpath = "Initial"></tsd:field> <tsd:field xpath = "Lastname"></tsd:field> </tsd:standard> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name = "Firstname" type = "xs:string"></xs:element> <xs:element name = "Initial" type = "xs:string"></xs:element> <xs:element name = "Lastname" type = "xs:string"></xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Here are some example documents for this schema:
<Document> <Name> <Firstname>Paul</Firstname> <Initial>J</Initial> <Lastname>Bloggs</Lastname> </Name> </Document> <Document> <Name> <Firstname>Fred</Firstname> <Initial>M</Initial> <Lastname>Bloggs</Lastname> </Name> <Name> <Firstname>Paul</Firstname> <Initial>J</Initial> <Lastname>Atkins</Lastname> </Name> </Document>
For the first document, the value (Paul,J,Bloggs) is added to the compound index; for the second document, the values (Fred,M,Bloggs) and (Paul,J,Atkins) are added (the tuple notation is used here only for readability purposes, internally Tamino uses a compact serialization format). The following query will make use of the compound index:
for $d in input()/Document for $n in $d/Name where $n/Firstname = "Paul" and $n/Initial = "J" and $n/Lastname = "Bloggs" return $d
_XQL = /Document[Name[ Firstname = "Paul" and Initial = "J" and Lastname = "Bloggs"] ]
This query finds the first document, the second one does not match
because the values Paul
and J
appear under one
Name
element, and the value Bloggs
under another. The query optimizer detects the compound index and scans the
index for the value (Paul,J,Bloggs) which is composed from the parts given in
the query. Thus, the query can be answered by one index lookup, although it
consists of several criteria. Without a compound index, each component had to
have its own standard index (in order to have an index-supported query), and
several separate index lookups would be necessary.
Moreover, this example shows a much greater performance improvement than only saving index lookups. The criteria are:
The compound index is hosted by the Name
element, which means that the compound values are built relative to
Name
,
and the Name
element has a multiplicity
greater than 1.
In other words, the values of the example compound index are grouped by
Name
elements. Without the compound index, when each
component has its own standard index, there is no such grouping, and the index
does not know to which occurrence of the Name
element a particular value belongs. Thus, when executing the given query
against three separate indexes, the index lookup will also find the second
document (because all requested values appear somewhere in that document), and
a subsequent postprocessing step is needed to find the correct result. This
unnecessary reading of the second document is avoided with the compound
index.
This first query example contains predicates for each component of the compound index. But the compound index can also be used if less predicates appear in the query. The rule is:
The set of predicates in the query has to refer to the components of the compound index from left to right (in definition sequence).
The predicates have to be connected by and
.
The and
operation must be in the scope of the location of
the compound index (for example, with the compound index on the
Name
element, the and
must combine
paths relative to Name
).
The predicates have to be "=" comparisons, with the exception of the last predicate in definition sequence which may be an arbitrary relational comparison operator.
The following query examples illustrate this rule. The first set of queries makes use of the compound index, and postprocessing is not necessary:
for $d in input()/Document for $n in $d/Name where $n/Firstname = "Paul" return $d
_XQL = /Document[Name[Firstname = "Paul"] ]
for $d in input()/Document for $n in $d/Name where $n/Firstname > "Paul" return $d
_XQL = /Document[Name[Firstname > "Paul"] ]
for $d in input()/Document for $n in $d/Name where $n/Firstname = "Paul" and $n/Initial = "J" return $d
_XQL = /Document[Name[ Firstname = "Paul" and Initial = "J"] ]
for $d in input()/Document for $n in $d/Name where $n/Initial = "J" and $n/Firstname = "Paul" and $n/Lastname < "Bloggs" return $d
_XQL = /Document[Name[ Firstname = "Paul" and Initial = "J" and Lastname < "Bloggs"] ]
The next set of queries makes use of the compound index, but an additional postprocessing step is needed because the predicates do not fulfill the rule described above. The query optimizer selects those predicates that fulfill the rule in order to find a minimal superset of the final result using the compound index:
for $d in input()/Document for $n in $d/Name where $n/Firstname = "Paul" and $n/Lastname = "Bloggs" return $d
_XQL = /Document[Name[ Firstname = "Paul" and Lastname = "Bloggs"] ]
for $d in input()/Document for $n in $d/Name where $n/Firstname = "Paul" and $n/Initial > "J" and $n/Lastname > "Bloggs" return $d
_XQL = /Document[Name[ Firstname = "Paul" and Initial > "J" and Lastname > "Bloggs"] ]
The following query cannot use the compound index because there is no
predicate for the first component (Firstname
):
for $d in input()/Document for $n in $d/Name where $n/Initial = "J" and $n/Lastname = "Bloggs" return $d
_XQL = /Document[Name[ Initial = "J" and Lastname = "Bloggs"] ]
The following query cannot use the compound index because the
and
operation is not in the scope of the element hosting the
compound index (the Name
element):
for $d in input()/Document where $d/Name/Firstname = "Paul" and $d/Name/Initial = "J" and $d/Name/Lastname = "Bloggs" return $d
_XQL = /Document[ Name/Firstname = "Paul" and Name/Initial = "J" and Name/Lastname = "Bloggs"]
Compound indexes should be used very carefully if one or even several of
the components are multiple (relative to the element hosting the compound
index), which means in the example above if a Name
could consist of several Firstnames
. In this case, all possible
value combinations (the cross-product) are built and added to the index, so
that the index can become very large.
A reference index consists of two parts:
The actual reference index (denoted by tsd:reference
) is
specified at a particular path in the schema. All document occurrences of that
path are then assigned a node ID which is unique across the doctype.
Other indexes (standard, text, compound) located below the reference
index can refer to that reference node by specifying
tsd:refers
.
Specifying a reference index makes sense only if
the reference node has a multiplicity greater than 1,
and there are at least two referencing indexes.
The schema used for compound indexes (simplified by leaving out the
Initial
element) is now reformulated using a
reference index. Firstname
has a text index, and
Lastname
has a standard index, both referring to the
Name
element:
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xs = "http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "reference"> <tsd:collection name = "MyCollection"></tsd:collection> <tsd:doctype name = "Document"> <tsd:logical> <tsd:content>closed</tsd:content> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "Document"> <xs:complexType> <xs:sequence> <xs:element name = "Name" maxOccurs = "unbounded"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:reference></tsd:reference> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name = "Firstname" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:text> <tsd:refers>/Document/Name</tsd:refers> </tsd:text> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> <xs:element name = "Lastname" type = "xs:string"> <xs:annotation> <xs:appinfo> <tsd:elementInfo> <tsd:physical> <tsd:native> <tsd:index> <tsd:standard> <tsd:refers>/Document/Name</tsd:refers> </tsd:standard> </tsd:index> </tsd:native> </tsd:physical> </tsd:elementInfo> </xs:appinfo> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Here are two example documents.
<Document> <Name> <Firstname>Paul</Firstname> <Lastname>Bloggs</Lastname> </Name> </Document> <Document> <Name> <Firstname>Fred</Firstname> <Lastname>Bloggs</Lastname> </Name> <Name> <Firstname>Paul</Firstname> <Lastname>Atkins</Lastname> </Name> </Document>
When these documents are stored, each
Name
element is assigned a unique ID, and the values
for the other indexes are built as usual. The semantic of a referencing index,
however, is different: while a classic index contains the information
"the value 'Bloggs' appears in the document with ino:id 17", a
reference index says "the value 'Bloggs' appears in the Name node with ID
5". Thus, a reference index achieves a grouping effect similar to the
one described for compound indexes: the values Fred
and
Bloggs
are grouped under the first Name
node of the second document, and the values Paul
and
Atkins
are grouped under the second
Name
node.
Queries can make use of this scenario if
there are predicates on the referencing index that are combined by an
and
,
and the and
operator is in the scope of the reference
index.
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document for $n in $d/Name where tf:containsText ($n/Firstname, "Paul") and $n/Lastname = "Bloggs" return $d
_XQL = /Document[Name[ Firstname ~= "Paul" and Lastname = "Bloggs"] ]
The index lookups on Firstname
and Lastname
and the subsequent intersection find the only Name
node that fulfills the criteria, postprocessing is avoided. Without a reference
index, the index lookup would find both documents (because the values
Paul
and Bloggs
appear somewhere in both documents),
and only postprocessing will find the correct result.
In such a scenario, the query performance is improved significantly
because the and
operation can be performed on the level of the
Name
element instead of the document level. On the
Name
element level, the intersection delivers
already the final result, no document is read from disk only to be rejected by
the postprocessor (which would happen without a reference index).
The following example queries make use of the reference index, but there is no performance benefit compared to classic indexes.
declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" for $d in input()/Document for $n in $d/Name where tf:containsText ($n/Firstname, "Paul") return $d
_XQL = /Document[Name[Firstname ~= "Paul"]]
This query has only one predicate, thus there is no improvement because
there is no intersection on the Name
element level.
Similarly, there would be no improvement if the query had several predicates
combined with or
.
The next query uses an and
which is not in the scope of the
Name
element. The intersection is on the document
level, and the correct result could also be found by classic indexes without
postprocessing.
for $d in input()/Document where $d/Name/Firstname = "Paul" and $d/Name/Lastname = "Bloggs" return $d
_XQL = /Document[ Name/Firstname = "Paul" and Name/Lastname = "Bloggs"]
Actually, the latter examples should be avoided with a reference index.
The index lookup of a referencing index (e.g. Firstname
) delivers
node IDs of the reference index (Name
in this
example). These node IDs have to be transformed to document IDs. This is
unnecessary overhead if the same result can be achieved by classic indexes. In
a "good" reference index scenario, this overhead also exists, but
it is by far compensated by saving unnecessary document reads.
Both reference index and compound index achieve performance improvements in more or less the same scenario where index values can be grouped relative to a particular node that has a multiplicity greater than 1.
Hence the question comes up which one should be preferred if both can be applied. The general recommendation is to use a compound index if it satisfies the query requirements. The reason is that a reference index needs more overhead, as described above.
But a compound index is not always feasible. A reference index is more
flexible: It can work with all index types (while a compound index is always a
standard index), and it can be nested (there may be several levels with
tsd:reference
).
As pointed out in the previous chapters, the performance improvement
that can be achieved with compound and reference indexes heavily depends on the
grouping of values relative to particular nodes (the tsd:reference
node or the node at which the compound index is defined). The selectivity of a
compound or reference index is much higher compared to classic standard indexes
if these value groups identify a much smaller result set than without
grouping.
In order to determine the selectivity improvement, two different count queries can be issued. The first one counts the number of documents that represents the query result:
{-- query based on the compound index example --} count ( for $d in input()/Document for $n in $d/Name where $n/Firstname = "Paul" and $n/Initial = "J" and $n/Lastname = "Bloggs" return $d )
_XQL = count (/Document[Name [ Firstname = "Paul" and Initial = "J" and Lastname = "Bloggs"] ] )
{-- query based on the reference index example --} declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" count ( for $d in input()/Document for $n in $d/Name where tf:containsText ($n/Firstname, "Paul") and $n/Lastname = "Bloggs" return $d )
_XQL = count(/Document[Name[ Firstname ~= "Paul" and Lastname = "Bloggs"] ])
The second one counts the number of documents that had to be read if there was no reference or compound index, and which had then to be presented to the postprocessor:
{-- query based on the compound index example --} count ( for $d in input()/Document where $d/Name/Firstname = "Paul" and $d/Name/Initial = "J" and $d/Name/Lastname = "Bloggs" return $d )
_XQL = count (/Document[ Name/Firstname = "Paul" and Name/Initial = "J" and Name/Lastname = "Bloggs"] )
{-- query based on the reference index example --} declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction" count ( for $d in input()/Document where tf:containsText ($d/Name/Firstname, "Paul") and $d/Name/Lastname = "Bloggs" return $d )
_XQL = count(/Document[ Name/Firstname ~= "Paul" and Name/Lastname = "Bloggs"])
If these numbers differ significantly for a representative set of values, this is a good indication to define a compound or a reference index (depending on which one is feasible).