The following sections present guidelines for efficient querying with X-Query:
X-Query processing involves a pre-selection and a post-selection step. In the pre-selection step, the indexes are used to select a subset of the final result set. In the post-processing step, this set is further restricted by applying the filter predicates that cannot be evaluated by an index access. This post-processing step involves the detailed analysis of each record contained in the intermediate result set.
If the preselection state is missing, it means that the whole doctype
has to be read. Even for queries that have a small result this will cause a
large response time. You can easily determine if a pre-selection is used if you
include your query string into ino:explain
(see next section
The X-Query Function
ino:explain
). Tamino tells you whether your query
involved pre-selection or/and post-processing.
Here are a few more guidelines for efficient querying:
There is one situation when an indexed node cannot be handled during pre-selection: The query for the non-existence of the node. When a node does not exist, its value is also not contained in the index, and consequently, the test for a value cannot rely on the index. This test will therefore be processed during the post-processing phase. Depending on the size of the pre-selected document set, this test can be slow.
A common problem is the use of the equality operator (=) when only a text index is defined, or the use of the contains operator (~=) when only a standard index is defined. In both cases, Tamino will correctly evaluate the query, but via post-processing. If you frequently apply both operators on the same node, consider defining it as both a standard and text index.
Make use of Tamino's X-Query extensions to XPath. These expressions perform better than the equivalent standard XPath expressions.
To always obtain correct results, make key and search expression type compatible, i.e. use a string search value for an alphanumeric key and a numeric search value for a numeric key. Comparing, for example, an alphanumeric constant with a numeric element causes the numeric element being converted into a string and a string comparison being performed. This would not return the expected results, and the performance suffers from this conversion, too.
Generally, it should be considered whether queries using the contains operator (~=) should be reformulated, using the starts-with() operator. Here is an example:
/MyDocument [key-field ~= "abc*"]
...can be re-formulated as:
/MyDocument [starts-with(key-field, "abc"]
Starts-with makes use of a standard index, while the contains operator uses a text index. Usually, standard indexes consume less space and can therefore be loaded and updated faster. If the query needs post-processing, and if the key-field exists in many places in the document, the evaluation of the filter [key-field ~= "abc*"] may become costly.
A disadvantage, on the other hand, is the fact that many semantic differences exist: contains is more powerful, while starts-with requires the value of the key-field to start exactly with the given prefix: upper-/lower case must be observed, as well as the number of whitespace characters in the prefix value; there is no umlaut transformation in starts-with; and finally the contains operator looks for matching words within the value, while starts-with only checks the very beginning of the value.
It is generally recommended to avoid wildcards (*) and descendant operators (//) if the path is known.
For further examples and more information, see the Advanced Concepts Guide on efficient querying.
The following hints apply only for queries which need to be executed as fast as possible, meaning that it is important if they run 100 or 200 milliseconds (e.g. if they should run in parallel against a Tamino Server).
If you run queries via HTTP, it is possible with certain environments
that the HTTP GET
method has a better performance
than the HTTP POST
method. When constructing a high
performance application, it may be useful to check the runtimes of a query with
GET
and POST
.
Here is a list of hints for X-Query syntax:
Queries must be indexed.
Make sure that the result set is as small as possible.
A list of logical terms in query expressions should be small. For
example, a query with 20 and/or
is slower than a query with one
and
.
The logical terms with the highest hit rate (highest selectivity) should be placed at the beginning of the expression.
Use the betw
operator instead of and
operations.
Functions like count()
in a query may be expensive.
Avoid using the adj
operator (see
tf:ContainsNearText
).