The composition of complex data objects from simpler database objects has a long tradition in relational technology. In particular, the join operation is heavily used there because relational technology decomposes complex information structures into "flat" two-dimensional tables consisting of atomic values. To reconstruct the complex information structures from those tables, it is necessary to "join" several tables during a query. In addition, by providing a join operation when querying data, relational databases allow users to re-arrange and combine data freely in ways that were not foreseen when the data model was designed.
With a native XML database, composition is used much more sparingly, because the database can store complex information items in their native form, so it is not necessary to "re-compose" these information items from flat tables. However, there are still cases in which we may want to combine several documents (or several document parts) into a single document, or in which we want to rely on other documents to retrieve a certain document.
Take, for example, our jazz encyclopedia. Maybe we want to find all
collaborations in which a given jazz musician participates. Because we do not
know the jazz musician's ID, we want to use his or her first and last name as
search criteria instead. This requires us to locate a matching
jazzMusician
document first, extract the ID from that document,
and then find a collaboration document that matches the ID in the attribute
jazzMusician/@ID
– a typical situation for a join.
Mathematically, a join in its most general form is the Cartesian product (cross product) of two document types, followed by some constraint to select only a part of the result set. However, this is only the mathematical theory because it is very inefficient: the Cartesian product of 1,000 jazz musicians with 3,000 collaborations would result in at least 3*109 combinations. (Remember that each collaboration points to at least two jazz musicians, so we get 3000*1000*1000 combinations!) Therefore, database implementations differ vastly from this approach.
Tamino supports document composition using the full dynamic join functionality provided with the XQuery 4 query language.
XQuery 4 is a very powerful query language subsuming the functionality of both XSLT and XPath, although with a different, SQL-like syntax. XQuery 4 is based on the W3C XQuery recommendation. Language features such as FLWR-expressions (for, let, where, return) and variables allow for the simplest and the most complex join operations. In addition, XQuery supports namespaces and the full XML Schema type system.
The following example demonstrates how we can compose a joint document
from collaboration
instances, jazzMusician
instances,
and album
instances:
default element namespace = "http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia" for $c in input()/collaboration return <collaboration type={$c/@type} ID={$c/@ID}> { $c/name } { $c/performedAt } { $c/period } { for $id in $c/jazzMusician/@ID let $j := input()/jazzMusician[@ID = $id] return <jazzMusician ID={$j/@ID}> {$j/name} {$j/birthDate} {$j/plays} </jazzMusician> } { let $a := input()/album[@albumNo = $c/result/@albumNo] return <album albumNo={$a/@albumNo}> {$a/title} {$a/track} </album> } </collaboration>
Here, we use a for
instruction to run through all
occurrences of the node collaboration/jazzMusician/@ID
. We then
use the value of this node to select jazzMusician
document
instances from collection encyclopedia
. The actual join expression
is contained in the filter expression [@ID = $id]
. XQuery 4 allows
XPath-style expressions within XQuery expressions, so you can leverage some of
your skills writing XPath expressions. An alternative way to express such a
join would be to replace the expression let $j :=
input()/jazzMusician[@ID = $id]
with for $j :=
input()/jazzMusician where $j/@ID = $id
. Since XQuery 4 allows nested
queries and nested loops, and any number of variables, join expressions can be
very complex.
In the second part of the query we perform a join with album documents.
Since the node collaboration/result
can only have single
occurrences, we can use let
instead of for
.