In classical relational databases, integrity rules and triggers are used to maintain the integrity of the information stored in the database. Integrity means that the constraints defined in the conceptual model are not violated and that the data structures defined in the conceptual model are kept intact. This is possible by applying integrity rules and triggers within the same transactional context as the operations that modify the stored information.
Especially this last condition – the transactional context – becomes impossible to satisfy when we extend our data model beyond the boundaries of traditional enterprise databases. When a model includes data from sources somewhere on the World Wide Web, it becomes impossible for database systems to guarantee the integrity of data structures that span beyond the boundaries of the transactional environment. For example, a database cannot "lock" foreign web resources during a transaction, and thus cannot stop other users from interfering with that transaction.
On the other hand, web resources may be temporarily unavailable. And, increasingly, our hardware is becoming mobile, either as traveling PDAs, or in the form of wireless LANs. In these cases, it is not always possible to satisfy integrity constraints immediately, and instead of using transactional integrity techniques we need to use synchronization techniques to keep the data model consistent in the long term.
In general, the resource manager (i.e. the database) is the wrong instance for the enforcement of data integrity. In many cases this task is better left to the application logic, or to appropriate middleware.
In the following sections we indicate how constraints can be defined for XML documents. The method of choice in Tamino for implementing constraints is triggers. See the description of trigger functions in the chapter Tamino Server Extension Functions in the documentation for server extensions for details.
Constraints are used to add more meaning to a model. During the
definition of the XML schemas we have already added a considerable set of
constraints to our model: datatypes. Each datatype such as string, float or
integer constrains the value domain of an element or attribute. Additional
constraints are enumerations or type parameters (facets) such as
totalDigits
, maxLength
, minExclusive
,
etc.
Another type of constraint is the cardinality constraint, which can be
defined in schemas using minOccur
and maxOccur
. For
example, by decorating the element
<xs:element name = "jazzMusician" type = "xs:string" minOccurs = "2" maxOccurs = "unbounded" >
in collaboration
, we set up a constraint that a
collaboration must consist of at least two jazz musicians. Actually, an element
with no minOccur
/maxOccur
decoration at all has the
strictest constraints: it requires a cardinality of 1..1
. The
weakest cardinality constraint is minOccurs = "0" maxOccurs =
"unbounded"
which leaves all possibilities open.
All these constraints can be checked by a validating parser. This happens, for example, when a document is inserted into or updated in Tamino.
What interests us in this context are constraints that
affect more than one element or attribute. For example, we want to make sure
that a jazz musician of type instrumentalist
plays at least one
instrument, whereas other types of jazz musicians (jazzComposer
,
jazzSinger
) are not required to play an instrument. Here, the
standard trigger functions of Tamino can be used to perform the constraint
checking.
The document()
function in XPath can be used to access
multiple documents in a single query. This allows us to formulate constraints
that span multiple documents. Let us assume that we have the following
collaboration
and jazzMusician
documents stored in a
Tamino database
http://localhost/tamino/jazz/
in collection
encyclopedia:
<?xml version="1.0"?> <collaboration type="jamSession"> <name>post-election jam</name> <jazzMusician> http://localhost/tamino/jazz/encyclopedia/dizzy.xml </jazzMusician> <jazzMusician> http://localhost/tamino/jazz/encyclopedia/parker.xml </jazzMusician> <performedAt> <location>Blues House</location> <time>1965-10-21T20:00:00</time> </performedAt> </collaboration>
<?xml version="1.0"?> <jazzMusician ID="ParkerCharlie" type="instrumentalist"> <name> <first>Charlie</first> <last>Parker</last> </name> <birthDate>1920-08-19</birthDate> </jazzMusician>
<?xml version="1.0"?> <jazzMusician ID="GillespieDizzy" type="instrumentalist"> <name> <first>Dizzy</first> <last>Gillespie</last> </name> <birthDate>1917-10-21</birthDate> </jazzMusician>
We want to check that the performance date of the jam session is not earlier than the birth dates of its participants. We can achieve this with the following rule:
<rule context = "collaboration[@type='jamSession']/jazzMusician"> <assert test = "number(translate(document(.)/*/birthDate,'1234567890-','1234567890')) < number(translate(substring(../performedAt/time,1,10),'1234567890-','1234567890'))"> No jam for unborn child <value-of select="document(.)/*/name/last"/>! </assert> </rule>
As we can see, the rule is executed in the context
collaboration[@type='jamSession']/jazzMusician
. The filter
expression restricts the application of the rule to collaborations of type
jamSession
. The content of the element jazzMusician
is used as a URL to locate the appropriate document (document(.)
).
From this document we fetch the element birthDate
.
The translate()
function removes the dashes from the ISO
date string before the string is translated into a number. The same process is
performed with the date part of element performedAt/time
of the
current document. Then both dates are compared using the operator
<
(<). This rather clumsy process of translation and
conversion into a number is necessary because XPath 1.0 does not support order
relations between strings (strings can only be compared for equality) and, of
course, XPath 1.0 does not support XML Schema datatypes. XPath 2.0 should
improve this situation substantially.
To make the resulting report more informative, we include the name of
the offending musician into the error message, too. This is done with the
value-of
clause.
Let us now assume that the collaboration document does not contain
pointers (URLs) to the jazzMusician
documents but instead
identifies jazz musicians by their ID. This is what we actually want because
usually URLs do not make good keys: they specify a location but do not identify
a document.
<jazzMusician>GillespieDizzy</jazzMusician> <jazzMusician>ParkerCharlie</jazzMusician>
We assume, too, that the documents are stored in Tamino. In this case we must replace all
document(.)/*
expressions with
document(concat('http://localhost/tamino/jazz/encyclopedia?_XQL=jazzMusician[@ID="', .,'"]'))//jazzMusician
i.e., we construct an HTTP query to Tamino, such as:
http://localhost/tamino/jazz/encyclopedia?_XQL=jazzMusician[@ID="ParkerCharlie"]
and then extract the root node (jazzMusician
) of the result
document returned.
Documents should only be written into the database after we have made sure that they do not violate the constraints imposed on them, i.e. that they comply with the application's business rules.
When a document is stored, Tamino checks the structural constraints and the datatype constraints defined in the document schema. This can be influenced by the content model definition for the document type. If the content model is set to "closed", Tamino only allows nodes that are defined in the document schema. Otherwise, Tamino allows additional nodes within a document instance.
Apart from that, as outlined above, other constraints may exist that cannot be appropriately described with XML Schema. Examples are cross-field constraints and cross-document constraints.
It is the application's responsibility to check for such constraints. In particular, the validation of cross-document constraints requires extra consideration for the transaction logic. To make the validation bulletproof, the validation and the following update must be performed in a single transaction with the isolation level set to "_shared" or "_protected". When doing so, we must apply the same guidelines for accessing multiple documents in one transaction as we outlined above in order to avoid deadlocks.
Tamino's unique document key mechanism prevents users from storing (in a specific doctype) multiple documents with the same key. A key may be composed of one or more values of elements or attributes contained in the document. The unique document key mechanism monitors incoming documents according to specified constraints and prohibits the storage of these documents in a single document container (doctype) if a duplicate document key is identified. This is especially useful for the administration of user IDs and other IDs that have to be unique. Uniqueness can be set in the XML Schema for the document type.