Tamino XML Server Version 9.7
 —  Advanced Concepts  —

Efficient Querying

Before we go into the details of database tuning, we should make clear that due to ongoing development and tuning of the Tamino engine the performance hints given here are only based on a snapshot of the current situation. Future versions of Tamino may perform differently.

Optimizing Tamino for efficient querying involves three steps: Data Modeling, Index Definition, Query Definition.


Data Modeling for Efficiency

First we should get the data model right:

For our example, we created one document type for each of the business objects style, jazzMusician, collaboration and album.

For example, with XML it would be easily possible to create a single document containing our whole jazz encyclopedia (which, eventually, could grow into a size of several hundred MB). But do not expect good performance from such a design.

In our example, we have stored the album reviews in separate documents. These reviews are far less likely to be accessed than the album document itself.

For example, if we frequently need to know how many albums a jazz musician has published, retrieval performance could be improved by including this information in each jazzMusician document; we would no longer need to search all collaborations of a musician and then count the albums. However, we would need to update all referenced jazzMusician documents each time we insert, update or delete a collaboration document.

Top of page

Efficient Indexing

The next step is to define indexes correctly:

Top of page

Efficient Queries

Finally, we look at the queries. In this section, the majority of the examples are based on X-Query, but equivalent processing is possible in XQuery. For examples of equivalent coding in X-Query and XQuery, see the Performance Guide in the Tamino documentation set.

Internal query processing can involve a pre-selection step and a post-processing step, depending on the nature of the query. If the query involves searching on one or more indexes, the pre-selection step finds the documents that satisfy the index search criteria; if the query involves search criteria that do not use indexes, the post-processing step is required.

graphics/preselect.png

In the pre-selection step, the indexes are used to select an intermediate result set. In the post-processing step, this set is narrowed by applying the remaining search criteria. This post-processing step involves detailed analysis of each record contained in the intermediate result set.

For example, in the query:

jazzMusician[belongsTo/style/@name="Bebop" and name/first="Charlie"]

the expression belongsTo/style/@name="Bebop" is processed as a pre-selection because the foreign key belongsTo/style/@name is defined as a standard index.

The rest of the filter expression name/first="Charlie" is processed during the post-processing phase because name/first is not defined as an index.

The same is true for the equivalent XQuery 4 expression. In

for $j in input()
   where $j/belongsTo/style/@name="Bebop" and $j/name/first="Charlie"
   return {$j}

$j/belongsTo/style/@name="Bebop" is executed first to construct the pre-selection set, then $j/name/first="Charlie" is executed.

Queries that do not have a pre-selection step (because there are no indexes among the search criteria) cause a long response time when only a few records are extracted from a large collection. You can easily determine whether a pre-selection is used with your query: put your X-Query query string in ino:explain(...) and Tamino will tell you whether your query involves a pre-selection and whether it involves post-processing.

In XQuery 4 we can obtain the same information by including the expression {?explain?} in the query prologue.

The query above, for example:

ino:explain(jazzMusician[belongsTo/style/@name="Bebop"
                         and name/first="Charlie"])

results in:

<xql:result>
  <ino:explanation ino:preselection="TRUE"
                   ino:postprocessing="TRUE" />
</xql:result>

As already explained above, this query involves both a pre-selection and a post-processing phase. Not surprisingly, both ino:preselection and ino:postprocessing have the value "TRUE".

Because Tamino automatically separates pre-selection and post-processing criteria and applies further query optimization, the sequence of search criteria in a filter expression does not matter. For example, the query

jazzMusician[belongsTo/style/@name="Bebop"
             and name/first="Charlie"]

is executed at the same speed as

jazzMusician[name/first="Charlie"
             and belongsTo/style/@name="Bebop"]

(Remember, belongTo/style/@name is indexed, name/first is not.)

Here are a few more guidelines for efficient querying:

Queries that do not use post-processing are especially useful when it is not necessary to access any documents, for example, when using the count() function.

Top of page