Tamino XML Server Version 9.7
 —  Advanced Concepts  —

From Model to Schema

As we have seen, with XML and XML Schema we have many options for designing XML documents. Let us return to our conceptual model.


Adding Type Information

graphics/jazz3t.png

We are now in a position to add some type information to our model:

In the diagram, we have defined the XML Schema type system as the default type system of our model (Asset Oriented Modeling can handle multiple type systems within one model). Most of the properties and sub-properties in this model are now prefixed with a type name (separated by a blank). All properties used as primary keys are defined with datatype NMTOKEN. This will save us a lot of trouble later, when we want to transport a key value in the query part of a URL. (White space character handling in URLs is awkward.)

We see, too, that the type properties in the assets jazzMusician and collaboration are defined with an enumeration as type. This would translate into the XML Schema type xs:string with appropriate enumeration facets. The property grade in asset saxophone has a type that is constrained with the facets totalDigits and fractionsDigits.

In addition, we have factored out some complex properties (name and period) as explicit types. This is done by defining the abstract assets (indicated by the grayed-out label area) tName and tPeriod. We use the names of these assets as type names in various other assets such as jazzMusician, critic, style, belongsTo and collaboration. Note that we have improved the definition of tPeriod somewhat by making the property to optional. This allows for open-ended periods.

Top of page

Document-Centric Layout

Now we are ready to translate our conceptual model into XML Schema source code. However, the question arises, how we should best divide our model into individual schemas.

One extreme would be to create one XML document type for each asset. However, this has a disadvantage: because the existence of some asset instances can depend on the presence of other asset instances, we would require extra operations when deleting and updating assets. For example, if we wanted to delete a certain instance of jazzMusician, we would also have to delete the instruments he or she plays.

The other extreme would be to create a single document containing the whole model. This is even worse because such an implementation would not scale well. Such a document can become very big, and consequently various operations (loading, saving, parsing, etc.) would be very slow. Although Tamino can insert, delete, and update document subtrees, each update operation would lock the whole model and would not allow concurrent updates, even if the concurrent operation wants to update another asset.

We therefore choose the best compromise between these extremes and implement each business object as a single document. (In a more business-oriented scenario we would treat business documents such as Purchase Orders or Invoices in the same way.) This has the following advantages:

Top of page

Creating a Type Library

Our model contains global type definitions (the assets tPeriod and tName) that are not specific to a particular business object, and consequently in our design will not be specific to a specific schema. It makes sense to create a global type library that contains the XML Schema definition of these assets. Such a type library is created as an independent XML Schema file with the same target namespace:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
              elementFormDefault="qualified"
              attributeFormDefault="unqualified">
  <xs:complexType name="tPeriod">
    <xs:sequence>
      <xs:element name="from" type="xs:date"/>
      <xs:element name="to" type="xs:date" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="tName">
    <xs:sequence>
      <xs:element name="first" type="xs:token"/>
      <xs:element name="middle" type="xs:token" minOccurs="0"/>
      <xs:element name="last" type="xs:token"/>
    </xs:sequence>
  </xs:complexType>
</xs:schema>

This file can then be imported into the schema files that implement business objects. The XML Schema syntax to import a foreign schema file into the current schema is:

<xs:import namespace="..." schemaLocation = "typelib.xsd"/>

The xs:import clause is specified as a direct child of the xs:schema clause and must be specified at the very beginning of this clause. The attribute schemaLocation specifies the location of the imported file as a relative or absolute URL.

Top of page

Implementing Business Objects

Our model now results - apart from the global type library - in the following schemas:

album, collaboration, critic, jazzMusician, review, style.

graphics/jazzmusician.png graphics/collaboration.png
graphics/album.png graphics/style.png
graphics/review.png graphics/critic.png

The following paragraphs discuss some implementation decisions:

Top of page

Segmentation and Optimization

Although this document-centric approach is the preferred way to implement a conceptual model, it is sometimes necessary to make compromises, especially when documents become too large, or when operations become inefficient.

Large documents have several drawbacks:

It therefore seems sensible to split large documents into smaller ones. In particular, this is the case when a document is subject to unrestricted growth. Take for example the document type album from the example above. If we opted to include the text of all reviews in the respective album document, we could get a nasty surprise. If a lot of people review an album, our album document could become very large. That is one reason why we decided to model review as an explicit business object.

However, segmentation can also create problems. During retrieval we need more join operations, and some aggregating functions become slow. For example, if we want to find out the number of albums in which a jazz musician has participated, we would first have to retrieve all collaborations of that musician, and then count the albums referenced as a result of the collaboration.

This can be improved by adding redundancy to our document base. For example, we could include an album count in each jazzMusician document. The downside of this is that update operations become more complicated. When we add new albums, or when we delete albums, we have to update the respective counters in the jazzMusician instances as well. So, tuning of schemas is always a compromise. The best way almost always depends on the frequency of updates and retrievals, and whether it is more important to offer fast response times for retrieval or for update, and so on. Database tuning is not an exact science, but depends very much on heuristics, experience, and skill.

Top of page

Multi-Namespace Schema Composition

Let's return to the multi-namespace model defined in section Models and Namespaces. This model featured four namespaces:

How does this affect our XML schemas? The asset CD is defined as a separate business object, and thus results in a separate schema file with its own target namespace (http://www.softwareag.com/tamino/doc/examples/models/jazz/shop). We now have to implement the inherited arcs that lead to asset CD (from e:collaboration, e:review, and o:item). These arcs are implemented in the usual way within the respective schema files, in addition (and similar) to the arcs leading to e:album and o:product. Since these arcs are implemented via primary and foreign key constructs and not via reference or inclusion, all schemas stay single-namespace schemas.

Note, however, that the instruments are implemented differently. Instruments such as i:saxophone and i:trombone are part of the jazzMusician business object, and are consequently referred to (via an xs:element ref= clause) within the jazzMusician schema file. But because these instruments belong to a different model (and thus to a different namespace), they must be implemented in a schema file with target namespace http://www.softwareag.com/tamino/doc/examples/models/instruments. Let us assume that all instruments are defined as global elements in a schema file called instrument.xsd.

What we have to do then, is to import the file instrument.xsd into the file jazzMusician.xsd. And this is how it's done:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia"
           xmlns:e="http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:i="http://www.softwareag.com/tamino/doc/examples/models/instruments"
           elementFormDefault="qualified"
           attributeFormDefault="unqualified">
  <xs:import schemaLocation="typelib.xsd"/>
  <xs:import namespace="http://www.softwareag.com/tamino/doc/examples/models/instruments"
                schemaLocation="instrument.xsd"/>
  <xs:element name="e:jazzMusician">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="e:name" type="e:tName"/>
		  ...
        <xs:element name="e:plays"
                    minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:choice>
              <xs:element ref="i:saxophone"/>
            </xs:choice>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      ...
    </xs:complexType>
  </xs:element>
  ...
</xs:schema>

The two xs:import clauses are specified at the very beginning of the xs:schema clause. The namespace attribute specifies the namespace to be imported (this must match the target namespace definition in the imported schema file), and the schemaLocation attribute specifies the location of the file to be imported. In addition, we must specify a namespace prefix for the imported namespace. This is done in the xmlns:i attribute of the xs:schema clause. This prefix is used when we refer to a musical instrument, for example xs:element ref="i:saxophone". Note that there can be several import clauses in one schema, and even several import clauses for a given namespace.

As you can see, we have opted to use the prefix "e:" for the schema's target namespace http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia. This is just to preserve the namespace prefix usage in the conceptual model - continuing using this namespace as the default namespace for the schema would also be valid.

Top of page

Schema Evolution

Once a schema has been defined, it is very unlikely that it will always stay in the same state. Business requirements change and bugs are detected, so the schema must be modified in order to adapt to changing circumstances. In this section we discuss how a schema can be modified safely. "Safe" in this context means that the modified schema must still cover all existing valid document instances of the original schema. The following guidelines ensure that the new schema is at least as "wide" as the original schema:

These are general guidelines. You can also modify a schema in a way that is inconsistent with existing documents, providing you subsequently validate all affected documents, but this of course could be very time-consuming.

In Tamino XQuery 4, you can modify documents by using the xquery update statement to insert, delete, replace or rename nodes, but the resulting documents must comply with the existing schema; the schema itself cannot be modified by xquery update.

Top of page

Open Content Model

Schema developers cannot always predict the requirements that may arise in the field. XML Schema therefore provides extension mechanisms that allow document authors to include elements and attributes into document instances that are not declared in the schema. These extension mechanisms are implemented in XML Schema as wildcards (xs:any and xs:anyAttribute).

Let us assume, for example, that we want to make the definition of tName more generic, allowing document authors to include a title child element. We can allow document authors to insert any number of extra child elements before, between, and after the existing child elements with the following definition:

<xs:complexType name="tName">
  <xs:sequence>
    <xs:any namespace="##other" processContents="lax"
            minOccurs="0" maxOccurs="unbounded"/>
    <xs:element name="first" type="xs:token"/>
    <xs:element name="middle" type="xs:token" minOccurs="0"/>
    <xs:any namespace="##other" processContents="lax"
            minOccurs="0" maxOccurs="unbounded"/>
    <xs:element name="last" type="xs:token"/>
    <xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:anyAttribute processContents="lax"/>
</xs:complexType>

We have also added an xs:anyAttribute clause to allow for additional attributes.

Note the specification of namespace="##other" for the first two wildcards. This is to avoid non-determinism. Without such a specification, the wildcard could contain elements from the same namespace. When encountering a first or a last element in a document instance, the parser would not be able to decide if such an element should be accepted by the wildcard or by the following element specification without looking ahead in the input stream. For the same reason we did not introduce a wildcard in front of the element definition middle. middle is optional, so a parser would not know where to place an instance element: into the wildcard before or after the element middle.

Note that Tamino allows for an alternative (non-standard) open content model that does not suffer from this problem (see From Schema to Tamino::Schema level Definitions).

Top of page

Versioning

There are two questions that arise when we create a new version of an existing schema:

The first question is: Should we change the target namespace of the new schema? The answer is simple: If you want to invalidate the schema against existing document instances, and against existing schemas that might include or import this schema, do so. In this case, you should retain the old schema version in order to support existing applications. Usually, this option is taken when the changes in the schema are severe. In all other cases, leave the target namespace unchanged and indicate the new schema version by other means.

This leads us to the second question: How do we indicate a version number within a schema? The good news is that XML Schema features a version attribute in the xs:schema clause. The bad news is that parsers do not evaluate this attribute, so you won't see the version number when you access a document instance through a DOM API; extra application logic is required to read out the version number. This version number is meant for human consumption, it indicates the version of the schema to the schema author. To convey version information to applications, the best method is to specify a version attribute for the root element of a schema. We can give this attribute a fixed value reflecting the current version. This attribute does not show up in document instances, but applications can see it through the DOM API. Of course, nobody stops us from defining such version attributes for other elements than the root element, too, so you could add different version information to different subsections of the same schema.

Top of page