From UML to XML

The UML (Unified Modeling Language) is a popular object-oriented modeling method. Since it has been submitted as an ISO standard, we discuss it here also in the context of modeling for XML.


XML Support in UML

Most commercial CASE tools that support UML such as Rational Rose or TogetherSoft also support the importing and exporting of XML DTDs and/or XML Schema. In the simplest case, an existing DTD or XML Schema is simply imported into the CASE tool, resulting in a number of UML classes that represent the different nodes of the XML document. Side effects of this functionality are the possibility of converting from DTD to XML Schema and vice versa, and of generating a Java-based access layer for a given document type.

However, you should not misinterpret this technique as "conceptual" modeling: it results in a model of an implementation object. Generating XML schemas from a conceptual model is somewhat more demanding. In this chapter, we discuss how this can be achieved with relatively simple means.

What we should not expect in this context, however, is a complete solution that supports round-trip engineering. UML was developed with object-oriented implementation and design methods in mind. We should therefore experience (and tolerate) a slight impedance mismatch between UML and XML.

One way to generate code with a CASE tool is to write production rules for the tool's code generator. However, this is a proprietary approach, and we would have to demonstrate different solutions for each CASE tool on the market.

We therefore choose a method that can be applied to most CASE tools. Practically all CASE tools on the market support the exporting of metadata to XMI (XML Metadata Interchange). XMI is an XML-based standard for the exchange of modeling data between different design and development tools. It can capture virtually all information within a UML model.

In the context of this tutorial we use Poseidon for UML (the Community Edition is freeware, available from http://www.gentleware.com/), a commercialized version of ArgoUML, as our CASE tool. We define our jazz example in UML, then export it to XMI, and finally convert the resulting XMI into XML Schema with the help of an XSLT stylesheet.

From Conceptual Model to UML

Here are the mapping rules to cast an asset-oriented model onto UML:

  1. We decorate all identifying assets of business objects with the stereotype entity. This allows us to generate arcs leading to these assets differently (as these arcs lead to separate documents).

  2. We use qualified names for all assets (i.e. names with namespace prefixes). Because the colon is not a valid name character in most programming languages, we replace it by an underscore.

  3. Since UML is an object-oriented technology, it does not have a native concept of primary keys. It is conventional to decorate primary keys with the stereotype primaryKey.

  4. We represent the arcs of our conceptual model as unnamed UML associations. If required, we can decorate the source end of an association with a role name and the target end with a cardinality constraint.

    The exception to the rule are the arcs that are decorated with an is_a label. These are represented as a UML generalization/specialization. Multiple inheritance is allowed in UML. Thus, the conversion process must resolve inheritance relations because XML Schema does not support multiple inheritance.

  5. UML attribute specifications can include a type and an initial value. Other XML Schema-specific constraints, such as minOccurs, maxOccurs, form, maxLength, length, totalDigits, fractionDigits and enumeration, have no specific equivalent in UML but can be specified as tagged values (which we name appropriately xs_minOccurs, xs_maxOccurs, etc.). Similarly, a tagged value xs_fixed=true can be used to determine if the initial value shall be regarded as a fixed value or as a default value.

  6. We can use Java-based datatypes for attributes. These are already built into the modeler and can be mapped automatically onto XML Schema datatypes during the conversion process. We can also explicitly use the built-in datatypes of XML Schema, but we have to declare them explicitly in UML. We do this by defining classes such as xs_NMTOKEN or xs_ID and decorating them with stereotype type. We also introduce a pseudo datatype xs_any to indicate wildcards.

  7. UML does not support complex attribute definitions. Instead, we have to resolve complex properties. We have two options: (1) represent a complex property as an explicit aggregation, or (2) define a separate datatype for a complex property. In this example, we opt for the latter. For example, we introduce the datatypes tPerformedAt for performedAt(location&time), tPeriod for period(from,to) and tName for name(first,middle?,last).

  8. Alternatives (choice groups) require extra care. In UML we model them as a datatype generalization. For example, the property (performedAt(location&time)|period(from,to)) in asset collaboration is modeled as an element (which we call collaborationContext) with a type that is a generalization of the datatypes tPerformedAt and tPeriod.

  9. Clusters are represented as a generalizations also. To represent, for example, the cluster containing all the instruments, we introduce a generalized class instrument. Because we do not want this class to appear in the final schema, we define it as an abstract class. Similarly, we introduce a generalized class representing all classes that are subject to reviews, such as jazz musicians and albums.

  10. By default, we assume an ordered sequence for the attributes of an UML class and would therefore generate an xs:sequence connector. If we want an unordered sequence (resulting in an xs:all connector), we indicate this by attaching the tagged value xs_ordered=false to the respective UML class.

  11. Similarly, we attach the tagged value xs_mixed=true if a class shall contain mixed content.

Applying these rules, we finally arrive at the following model:

graphics/uml.png

graphics/uml1.png

Most UML tools provide a function to serialize a model into XMI format. XMI is an XML-based industry standard for the exchange of metadata between CASE tools. Because it is XML based, XMI can be converted with the help of XSLT stylesheets into other formats such as XML Schema. An example of such a stylesheet can be found at http://www.aomodeling.org/.