You can influence the performance of Tamino by adhering to the following general principles:
Before defining indexing information for your Tamino schema, you should as far as possible anticipate what type of query is most likely to be run against which nodes. These nodes are the candidates for index information.
In deciding whether to index a node with a text
or a
standard
index, remember that a text index provides optimal
performance only when text search operations will be used in queries.
Consider using a word fragment index only in those cases where it is absolutely necessary - by default, this option is set to no because using the word fragment index causes substantial overhead (all possible word fragments must be extracted from words and stored as index values). A word fragment index is only useful in combination with a text index. For details, see the document on Indexing in the Advanced Concepts documentation.
Using unreasonable queries can be detrimental to performance. The more specific or direct the query, the faster the response.
Consider whether your intended usage scenario would benefit from multipath indexes, compound indexes and reference indexes.
The success and speed of a query run against the Tamino database depends on the combination of index type defined for an XML object and the type of query that will be run against it. Another factor is the mapping type that determines where the object is stored. If typical requests of the application are known, the index types can then be set to support the most typical queries.
This section examines these factors and indicates the settings that are most likely to bring you optimal performance.
The default value "CONDENSED" on the structure-index attribute on the Doctype specifies that instance nodes not explicitly mapped in the schema will be registered in the structure index. This provides index support for wildcard and descendant operators in structural queries for nodes not mapped in the schema and thus improves query performance in those cases, in which there are no resulting documents. If you use for example the query "_XQL=a/b" with a condensed index, the result shows that the path a/b does not exist in the respective doctype. Thus query performance increases. With a condensed index, Tamino's optimizer knows if the index information is complete. Queries with wildcards (*) or the path operator (//) are much faster.
A full description of general attributes is given in Tamino XML Schema Reference Guide.
An enhanced query performance can be achieved with the use of the value "FULL" for the structure index. In this case, the structure index shows which path occurs in which document. The value "FULL" causes all instance nodes not defined in the schema to be registered by their structure and therefore significantly improves the performance of queries for such nodes. Another situation, where the "FULL" structure index is very useful is the case where optional elements in the schema are used in queries and there are only a few document with this optional element in the database. The price for using this setting is a slower load operation and a larger index. Though the default value is recommended for most applications, the value "FULL" has its value in situations in which heterogeneous instances of a doctype are expected with a number of element names that may not be known at schema definition time. As with other mapping aspects that impact performance, it is a trade-off between loading time and index size on the one hand and expected nature of the documents and queries on the other.
The value "No" is also recommended for documents with known elements in a random structure (for example, XHTML).
The text index creates an index for full text search capabilities.
This index type is optimal for retrieval of words, as it supports the contains
operator (see
tf:ContainsText
)
, as well as the operators "adj" (see
tf:ContainsAdjacentText)
and "near" (see
tf:ContainsNearText
).
Examples of using this operator are:
_xql=/patient[name/surname~='atkin*']
_xquery=declare namespace tf="http://namespaces.softwareag.com/tamino/TaminoFunction for $p in input()/patient where tf:containsText (/$p/name/surname, "atkin*") return $p
One of the effects of the contains operator is that the request is
normalized, basically meaning that the request is not case-sensitive (however,
this behavior depends on the settings of the schema element
ino:transliteration
; for a detailed description see
section Representation and Handling of Characters in
Unicode and Text
Retrieval). The effect of the wildcard character (*) is
that all instances of "patient" are found whose
surnames contain a word starting with "atkin".
Note that a text index creates an index for the whole content of the node. Node content in this sense means the concatenation of all descendant nodes (but not attribute nodes) that contain text. Setting a text index on intermediate nodes should therefore be practiced with caution.
The impact of a text index on performance is that creating index data takes time, which is an important consideration when loading data. The more data is loaded, the greater the impact on load time. You must therefore use it with a degree of caution rather than liberally.
Text indexing is not possible for data stored
into doctypes with the noConversion
flag set.
Define all primary and foreign keys used in the conceptual model as index of type standard.
Nodes that are used as sort criteria should be defined as indices of type standard also.
With a standard index, if an element has a string data type of
xs:string
, a string index is created. For this element, only
string comparisons can be handled via the index, making the internal lookup
much faster, and numeric comparisons are done without an index lookup. In the
case of the "=" operator, the index is used for a
preselection. Similarly, if the element has a numeric data type of
xs:integer
or xs:float
, only numeric comparisons can
be handled by the index. String comparisons must be done without an index
lookup. Thus, if an element born
has a numeric index, then
born > 1950
will generally be answered much faster than
born > "1950"
because this element has a numeric index, but no
string index.
For performance reasons, it is recommended to use this type of indexing only if it is absolutely necessary, because it means both a text and a standard index is created. This may lead to slower indexing operations.
However, this index type has its uses. An example of this is if you wish to search for patients whose surnames start with "At" (requires text index) but you also want to accelerate sorting the output alphabetically by surname (requires standard index).
The size of the query result set should be small, because the following rule applies: The larger the size of the result data, the longer the query response time. Especially on large databases, it is easy to construct queries which deliver large amounts of data. That is a very time-consuming process. There are two reasons for the time consumption:
Tamino has to construct the result set in memory first.
Then all the data must be transferred from the server to the client.
If you are not sure if the size of the result data will block the
system, it is recommended to compute the expected size of the query result set.
Use the function count()
, or a cursor, or formulate queries which
will request the ino:id
of the XML instances instead of the whole
data (e.g.: /my_doctype[ac~="willi"]/@ino:id)
. In the following
query, use the ino:id to retrieve the data.
Alternatively, you can use cursoring to restrict the result set to a subset of the documents that match the query. For more information, see for example the section The cursor command in the X-Machine Programming documentation.