Efficient Queries: X-Query

The following sections present guidelines for efficient querying with X-Query:


Efficient X-Queries

X-Query processing involves a pre-selection and a post-selection step. In the pre-selection step, the indexes are used to select a subset of the final result set. In the post-processing step, this set is further restricted by applying the filter predicates that cannot be evaluated by an index access. This post-processing step involves the detailed analysis of each record contained in the intermediate result set.

If the preselection state is missing, it means that the whole doctype has to be read. Even for queries that have a small result this will cause a large response time. You can easily determine if a pre-selection is used if you include your query string into ino:explain (see next section The X-Query Function ino:explain). Tamino tells you whether your query involved pre-selection or/and post-processing.

Here are a few more guidelines for efficient querying:

  • There is one situation when an indexed node cannot be handled during pre-selection: The query for the non-existence of the node. When a node does not exist, its value is also not contained in the index, and consequently, the test for a value cannot rely on the index. This test will therefore be processed during the post-processing phase. Depending on the size of the pre-selected document set, this test can be slow.

  • A common problem is the use of the equality operator (=) when only a text index is defined, or the use of the contains operator (~=) when only a standard index is defined. In both cases, Tamino will correctly evaluate the query, but via post-processing. If you frequently apply both operators on the same node, consider defining it as both a standard and text index.

  • Make use of Tamino's X-Query extensions to XPath. These expressions perform better than the equivalent standard XPath expressions.

  • To always obtain correct results, make key and search expression type compatible, i.e. use a string search value for an alphanumeric key and a numeric search value for a numeric key. Comparing, for example, an alphanumeric constant with a numeric element causes the numeric element being converted into a string and a string comparison being performed. This would not return the expected results, and the performance suffers from this conversion, too.

  • Generally, it should be considered whether queries using the contains operator (~=) should be reformulated, using the starts-with() operator. Here is an example:

    /MyDocument [key-field ~= "abc*"]

    ...can be re-formulated as:

    /MyDocument [starts-with(key-field, "abc"]

    Starts-with makes use of a standard index, while the contains operator uses a text index. Usually, standard indexes consume less space and can therefore be loaded and updated faster. If the query needs post-processing, and if the key-field exists in many places in the document, the evaluation of the filter [key-field ~= "abc*"] may become costly.

    A disadvantage, on the other hand, is the fact that many semantic differences exist: contains is more powerful, while starts-with requires the value of the key-field to start exactly with the given prefix: upper-/lower case must be observed, as well as the number of whitespace characters in the prefix value; there is no umlaut transformation in starts-with; and finally the contains operator looks for matching words within the value, while starts-with only checks the very beginning of the value.

  • It is generally recommended to avoid wildcards (*) and descendant operators (//) if the path is known.

For further examples and more information, see the Advanced Concepts Guide on efficient querying.

Very Fast Queries

The following hints apply only for queries which need to be executed as fast as possible, meaning that it is important if they run 100 or 200 milliseconds (e.g. if they should run in parallel against a Tamino Server).

If you run queries via HTTP, it is possible with certain environments that the HTTP GET method has a better performance than the HTTP POST method. When constructing a high performance application, it may be useful to check the runtimes of a query with GET and POST.

Here is a list of hints for X-Query syntax:

  • Queries must be indexed.

  • Make sure that the result set is as small as possible.

  • A list of logical terms in query expressions should be small. For example, a query with 20 and/or is slower than a query with one and.

  • The logical terms with the highest hit rate (highest selectivity) should be placed at the beginning of the expression.

  • Use the betw operator instead of and operations.

  • Functions like count() in a query may be expensive.

  • Avoid using the adj operator (see tf:ContainsNearText).