Document Composition

The composition of complex data objects from simpler database objects has a long tradition in relational technology. In particular, the join operation is heavily used there because relational technology decomposes complex information structures into "flat" two-dimensional tables consisting of atomic values. To reconstruct the complex information structures from those tables, it is necessary to "join" several tables during a query. In addition, by providing a join operation when querying data, relational databases allow users to re-arrange and combine data freely in ways that were not foreseen when the data model was designed.

With a native XML database, composition is used much more sparingly, because the database can store complex information items in their native form, so it is not necessary to "re-compose" these information items from flat tables. However, there are still cases in which we may want to combine several documents (or several document parts) into a single document, or in which we want to rely on other documents to retrieve a certain document.

Take, for example, our jazz encyclopedia. Maybe we want to find all collaborations in which a given jazz musician participates. Because we do not know the jazz musician's ID, we want to use his or her first and last name as search criteria instead. This requires us to locate a matching jazzMusician document first, extract the ID from that document, and then find a collaboration document that matches the ID in the attribute jazzMusician/@ID – a typical situation for a join.

Mathematically, a join in its most general form is the Cartesian product (cross product) of two document types, followed by some constraint to select only a part of the result set. However, this is only the mathematical theory because it is very inefficient: the Cartesian product of 1,000 jazz musicians with 3,000 collaborations would result in at least 3*109 combinations. (Remember that each collaboration points to at least two jazz musicians, so we get 3000*1000*1000 combinations!) Therefore, database implementations differ vastly from this approach.

Tamino supports document composition using the full dynamic join functionality provided with the XQuery 4 query language.


Dynamic Joins with Tamino XQuery 4

XQuery 4 is a very powerful query language subsuming the functionality of both XSLT and XPath, although with a different, SQL-like syntax. XQuery 4 is based on the W3C XQuery recommendation. Language features such as FLWR-expressions (for, let, where, return) and variables allow for the simplest and the most complex join operations. In addition, XQuery supports namespaces and the full XML Schema type system.

The following example demonstrates how we can compose a joint document from collaboration instances, jazzMusician instances, and album instances:

default element namespace = "http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia"

for $c in input()/collaboration
   return
     <collaboration type={$c/@type} ID={$c/@ID}>
       { $c/name }
       { $c/performedAt }
       { $c/period }
       { for $id in $c/jazzMusician/@ID
           let $j := input()/jazzMusician[@ID = $id]
           return
             <jazzMusician ID={$j/@ID}>
               {$j/name}
               {$j/birthDate}
               {$j/plays}
             </jazzMusician>
       }
    { let $a := input()/album[@albumNo = $c/result/@albumNo]
        return
          <album albumNo={$a/@albumNo}>
            {$a/title}
            {$a/track}
          </album>
    }
    </collaboration>

Here, we use a for instruction to run through all occurrences of the node collaboration/jazzMusician/@ID. We then use the value of this node to select jazzMusician document instances from collection encyclopedia. The actual join expression is contained in the filter expression [@ID = $id]. XQuery 4 allows XPath-style expressions within XQuery expressions, so you can leverage some of your skills writing XPath expressions. An alternative way to express such a join would be to replace the expression let $j := input()/jazzMusician[@ID = $id] with for $j := input()/jazzMusician where $j/@ID = $id. Since XQuery 4 allows nested queries and nested loops, and any number of variables, join expressions can be very complex.

In the second part of the query we perform a join with album documents. Since the node collaboration/result can only have single occurrences, we can use let instead of for.