Tuning Schemas and Queries

You can influence the performance of Tamino by adhering to the following general principles:

  • Before defining indexing information for your Tamino schema, you should as far as possible anticipate what type of query is most likely to be run against which nodes. These nodes are the candidates for index information.

  • In deciding whether to index a node with a text or a standard index, remember that a text index provides optimal performance only when text search operations will be used in queries.

  • Consider using a word fragment index only in those cases where it is absolutely necessary - by default, this option is set to no because using the word fragment index causes substantial overhead (all possible word fragments must be extracted from words and stored as index values). A word fragment index is only useful in combination with a text index. For details, see the document on Indexing in the Advanced Concepts documentation.

  • Using unreasonable queries can be detrimental to performance. The more specific or direct the query, the faster the response.

  • Consider whether your intended usage scenario would benefit from multipath indexes, compound indexes and reference indexes.

The success and speed of a query run against the Tamino database depends on the combination of index type defined for an XML object and the type of query that will be run against it. Another factor is the mapping type that determines where the object is stored. If typical requests of the application are known, the index types can then be set to support the most typical queries.

This section examines these factors and indicates the settings that are most likely to bring you optimal performance.

Using Structure Index

The default value "CONDENSED" on the structure-index attribute on the Doctype specifies that instance nodes not explicitly mapped in the schema will be registered in the structure index. This provides index support for wildcard and descendant operators in structural queries for nodes not mapped in the schema and thus improves query performance in those cases, in which there are no resulting documents. If you use for example the query "_XQL=a/b" with a condensed index, the result shows that the path a/b does not exist in the respective doctype. Thus query performance increases. With a condensed index, Tamino's optimizer knows if the index information is complete. Queries with wildcards (*) or the path operator (//) are much faster.

A full description of general attributes is given in Tamino XML Schema Reference Guide.

An enhanced query performance can be achieved with the use of the value "FULL" for the structure index. In this case, the structure index shows which path occurs in which document. The value "FULL" causes all instance nodes not defined in the schema to be registered by their structure and therefore significantly improves the performance of queries for such nodes. Another situation, where the "FULL" structure index is very useful is the case where optional elements in the schema are used in queries and there are only a few document with this optional element in the database. The price for using this setting is a slower load operation and a larger index. Though the default value is recommended for most applications, the value "FULL" has its value in situations in which heterogeneous instances of a doctype are expected with a number of element names that may not be known at schema definition time. As with other mapping aspects that impact performance, it is a trade-off between loading time and index size on the one hand and expected nature of the documents and queries on the other.

The value "No" is also recommended for documents with known elements in a random structure (for example, XHTML).

Basic Indexing

Text Index

The text index creates an index for full text search capabilities. This index type is optimal for retrieval of words, as it supports the contains operator (see tf:ContainsText) , as well as the operators "adj" (see tf:ContainsAdjacentText) and "near" (see tf:ContainsNearText). Examples of using this operator are:

_xquery=declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction
for $p in input()/patient
where tf:containsText (/$p/name/surname, "atkin*")
return $p

One of the effects of the contains operator is that the request is normalized, basically meaning that the request is not case-sensitive (however, this behavior depends on the settings of the schema element ino:transliteration; for a detailed description see section Representation and Handling of Characters in Unicode and Text Retrieval). The effect of the wildcard character (*) is that all instances of "patient" are found whose surnames contain a word starting with "atkin".

Note that a text index creates an index for the whole content of the node. Node content in this sense means the concatenation of all descendant nodes (but not attribute nodes) that contain text. Setting a text index on intermediate nodes should therefore be practiced with caution.

The impact of a text index on performance is that creating index data takes time, which is an important consideration when loading data. The more data is loaded, the greater the impact on load time. You must therefore use it with a degree of caution rather than liberally.

Text indexing is not possible for data stored into doctypes with the noConversion flag set.

Standard Index

Define all primary and foreign keys used in the conceptual model as index of type standard.

Nodes that are used as sort criteria should be defined as indices of type standard also.

With a standard index, if an element has a string data type of xs:string, a string index is created. For this element, only string comparisons can be handled via the index, making the internal lookup much faster, and numeric comparisons are done without an index lookup. In the case of the "=" operator, the index is used for a preselection. Similarly, if the element has a numeric data type of xs:integer or xs:float, only numeric comparisons can be handled by the index. String comparisons must be done without an index lookup. Thus, if an element born has a numeric index, then born > 1950 will generally be answered much faster than born > "1950" because this element has a numeric index, but no string index.

Standard and Text Index

For performance reasons, it is recommended to use this type of indexing only if it is absolutely necessary, because it means both a text and a standard index is created. This may lead to slower indexing operations.

However, this index type has its uses. An example of this is if you wish to search for patients whose surnames start with "At" (requires text index) but you also want to accelerate sorting the output alphabetically by surname (requires standard index).

Result Size

The size of the query result set should be small, because the following rule applies: The larger the size of the result data, the longer the query response time. Especially on large databases, it is easy to construct queries which deliver large amounts of data. That is a very time-consuming process. There are two reasons for the time consumption:

  • Tamino has to construct the result set in memory first.

  • Then all the data must be transferred from the server to the client.

If you are not sure if the size of the result data will block the system, it is recommended to compute the expected size of the query result set. Use the function count(), or a cursor, or formulate queries which will request the ino:id of the XML instances instead of the whole data (e.g.: /my_doctype[ac~="willi"]/@ino:id). In the following query, use the ino:id to retrieve the data.

Alternatively, you can use cursoring to restrict the result set to a subset of the documents that match the query. For more information, see for example the section The cursor command in the X-Machine Programming documentation.