Performance Tuning - A Case Study

A case study conducted with a Tamino installation produced the following hitlist of factors for performance tuning:

  1. Optimizing indexes for the most common queries

  2. Optimizing query formulation

  3. Optimizing efficiency of Java code calling Tamino APIs

  4. Tuning operating system performance

  5. Updating components of the system written by third parties or open source projects

The case exhibits several general principles for creating high-performance applications that have been observed repeatedly by Tamino users in the field:

  • There is only one over-riding "design time" rule: Many small documents are more efficient than a small number of larger documents. This is due to fundamental design decisions by the developers of Tamino, based on analyses of how XML is used in the real world. It is sometimes necessary to write some code that resides between a data-producing application and Tamino that will decompose huge documents into more manageable chunks for efficient storage. For example, imagine a book that consists of dozens of chapters: storing each chapter as a separate document is more efficient both for Tamino itself and for most XML tools such as XSLT engines that you will use to work with the data after it is retrieved.

  • Make it work, then make it fast. Trust Tamino to be fast, once properly tuned. If the application uses XML in a way that fits Tamino's design philosophy, do not worry about performance too much during the prototype phase.

  • Tamino (and the same applies for almost all DBMS systems) is fastest when most of the work of satisfying the query request can be done by processing the indexes. Thus, the key to performance tuning is to ensure that indexes have been defined for the most frequently queried elements and attributes. Also, you can and should place indexes on the elements/attributes that reference join criteria. See Efficient Querying section in the Advanced Concepts - From Schema to Tamino chapter of the Tamino documentation for more information, especially on using the "explain" facility.

  • Remember that much of time a program spends retrieving data from Tamino into an application data structure may not be in the DBMS itself, but in the API. Be careful to use appropriate libraries and actual calls, depending on whether human time or machine time is the more precious resource in a given situation, since programmer convenience often comes at a performance price and vice-versa. For example, most programmers who are not XML experts will find DOM/JDOM APIs easier to use than lower-level event-driven interfaces such as SAX. The overhead of building a DOM tree, allocating memory to hold the values of the XML elements and attributes, and copying the data from Tamino to the API and then to the application, can be significant in some cases. Consider re-writing performance critical sections of an application in a way that uses the most efficient techniques to build application objects from the XML text retrieved from Tamino.

  • Use system profiling tools to make sure that processing, disk, and memory resources are being used effectively by the overall system, of which Tamino is usually only one component. For example, a multiprocessor system will not be faster than a single processor system unless the various parts of the system can work in parallel, and this may require some profiling and load balancing. Similarly, make sure that the software is configured to use all the available memory if it is abundant, and to share it efficiently if it is not. Eliminate bottlenecks with hardware upgrades - additional memory, faster disk drives, faster networking - once they have been identified and if the hardware is cheaper than the human time or business cost of extensive tuning.

  • Update to the latest version of available software. XML technologies are rapidly maturing and as XML is being used to power larger and larger, more and more performance-critical applications, developers are learning how to make it work faster and more reliably all the time. Using the latest version of software components allows one to benefit from the testing and tuning experience of other users.