Introduction to XSLT

The "official" method for transforming XML into other formats (often presentation formats) is XSLT (eXtensible Stylesheet Language: Transformations). Historically, XSLT had been a part of the XSL (eXtensible Stylesheet Language) specification, but XSL was split into three parts: XPath, XSLT, and XSL Formatting Objects (XSL-FO). XSL-FO was designed as the presentation format for XML. However, it plays currently only a minor role since most of its functionality is covered by HTML+CSS. XSL-FO is usually an intermediate step when generating PDF from an XML document.

XSLT is now a recommendation in its own right. It enables style-sheet controlled transformations from one XML document format into another document format, which can be either XML or non-XML. XSLT can, for example, be used to transform presentation-neutral XML data into presentation formats such as HTML, XHTML, XForms, WML, SMIL, SVG, etc. In the chapter From Conceptual Model to Schema::Integrity we already discussed other applications for XSLT, such as constraint checking and generating XML Schema from XMI.

Although XSLT is quite powerful, it has some deficiencies that have led to the development of various extensions. Also, programmers who are familiar with imperative languages such as Java or C++ sometimes find it hard to think in XSLT's rule-based structures. For the transformation into HTML, however, most of the XSLT coding can be avoided by the use of XSLT generators, which allow visual construction of the resulting web page (or visual mapping of XML elements to HTML elements of an existing web page) and generate most of the required XSLT code. Examples of such generators are Altova's XML Spy, eXcelon's Stylus and Whitehill's XSL Composer.

Such tools are useful to develop stylesheets that map XML documents onto individual HTML pages. However, when we want to create generic transformations (for example, where the final layout depends on the document type and/or on the content), or when we need stylesheets to produce output other than HTML, we have to dig into XSLT programming. In the following sections we give a short introduction.


Procedural Transformation

The basic construct in XSLT are templates. Each XSLT stylesheet must consist of at least one template. A template can be explicitly invoked by name, or it can be implicitly applied via pattern matching according to the match expression defined in the head of the template. This allows two programming styles in XSLT which can be mixed freely, namely rule-based programming and procedural programming.

Rule-based programming

This is a more declarative approach. Rules (i.e. templates with a match expression) specify which elements of the input document they apply to, and how they transform these elements. Rules are applied recursively. The programmer describes the transformation in terms of logic and is not concerned with the sequence of execution.

Procedural programming

This programming style is easier to understand for programmers with experience in imperative languages such as Java or C. The programmer describes to the XSLT processor exactly what to do and in which sequence. The XSLT style sheet looks very much like the target document, with interspersed XSLT instructions to fill in the blanks.

To support procedural programming, XSLT provides the following operations:

Control structures.

XSLT instructions such as xsl:for-each, xsl:if, and xsl:choose provide procedural control structures for loops, conditional execution and case structures. The result of an xsl:for-each instruction can be sorted with an xsl:sort instruction and numbered with the xsl:number instruction.

The instruction xsl:call-template is used to invoke a template by name (recursive calls are possible). Parameters can be passed to the invoked template but it is not possible to return results to the caller.

The instruction xsl:apply-templates can be used to start rule-based processing (see Rule-Based Transformation).

Accessing content.

The xsl:value-of instruction writes the content of a node or node list to the output stream as text. The xsl:copy-of instruction writes the content of a node or node list to the output stream in its original form.

Here is an XSLT example that transforms album instances into an HTML page. We have extended the album schema from the chapter From Conceptual Model to Schema::From Model to Schema to include some more information:

Schema Instance
graphics/album3.png
<?xml version="1.0" encoding = "UTF-8"?>
<?xml-stylesheet type="text/xsl" href="album.xsl"?>
<album xmlns="http://www.softwareag.com/tamino/doc/
                       examples/models/jazz/encyclopedia"
       albumNo="BGJ-47">
  <title>Blues House Jam</title>
  <track>
    <title>Post Election Jam I</title>
    <duration>PT19M35S</duration>
  </track>
   <track>
    <title>Post Election Jam II</title>
    <duration>PT20M35S</duration>
  </track>
  <coverImage>
    post-election-jam.jpg
  </coverImage>
</album>

The stylesheet programming is strictly procedural and deterministic. It is the stylesheet that defines the layout of the resulting HTML file.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
     xmlns="http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Make sure we generate HTML output -->
<xsl:output method="html" indent="yes"/>
<!-- Just a single rule for the root node -->
<xsl:template match="/">
<!-- Generate HTML document root -->
<html><head/><body>
  <!-- Select album node -->
  <xsl:for-each select="album">
    <!-- The usual nested tables -->
    <table><tr><td>
      <table width="100%">
        <tr bgcolor="silver">
          <td>
            <!-- Title element as headline -->
            <h2><xsl:value-of select="title"/></h2><br/>
            <!-- Test if we have a publisher element -->
            <xsl:if test="publisher">
              <!-- if yes generate publisher entry -->
              Publisher:
                <xsl:value-of select="publisher"/><br/>
              </xsl:if>
              <!-- Generate album number entry -->
              AlbumNo:
              <xsl:value-of select="@albumNo"/>
          </td>
          <!-- Test if we have a cover image -->
          <xsl:if test="coverImage">
            <!-- if yes generate image reference -->
            <td>
              <img src="{coverImage}" alt="{title}"/>
            </td>
          </xsl:if>
        </tr>
      </table>
    </td></tr>
    <tr><td>
      <!-- now do the tracks -->
      <br/><h4>Tracks</h4>
      <table width="100%" >
        <!-- We may have multiple tracks, therefore loop -->
        <xsl:for-each select="track">
          <tr bgcolor="silver">
            <td>
              <!-- Print track number -->
              <xsl:number value="position()" format="1-"/>
              <!-- Print title of track element -->
              <xsl:value-of select="title"/>
            </td>
            <!-- Print duration -->
            <td align="Right">
              <!-- Convert duration to mm:ss format -->
              <xsl:value-of select=
                   "substring-before(substring-after(duration,'T'),'M')"/>:
               <xsl:value-of select=
                   "substring-before(substring-after(duration,'M'),'S')"/>
            </td>
          </tr>
        </xsl:for-each>
      </table>
    </td></tr></table>
  </xsl:for-each><br/>
</body></html>
</xsl:template>
</xsl:stylesheet>

To implement the stylesheet logic we have used the XSLT instructions discussed above. Optional elements are included in an <xsl:if> block to suppress the decoration (such as "Publisher:") if there is no publisher element.

Applying this stylesheet to the above XML document instance results in the following HTML file:

<html>
   <head>
      <meta http-equiv="Content-Type"
            content="text/html; charset=utf-8">
   </head>
   <body>
      <table>
         <tr>
            <td>
               <table width="100%">
                  <tr bgcolor="silver">
                     <td>
                        <h2>Blues House Jam</h2><br>
                        ProductNo:
                        BGJ-47
                     </td>
                     <td><img src="post-election-jam.jpg"
                              alt="Blues House Jam">
                     </td>
                  </tr>
               </table>
            </td>
         </tr>
         <tr>
            <td><br><h4>Tracks</h4>
               <table width="100%">
                  <tr bgcolor="silver">
                     <td>1-Post Election Jam I</td>
                     <td align="Right">19:35</td>
                  </tr>
                  <tr bgcolor="silver">
                     <td>2-Post Election Jam II</td>
                     <td align="Right">20:35</td>
                  </tr>
               </table>
            </td>
         </tr>
      </table><br></body>
</html>

The final representation in a web browser looks like this:

graphics/blueshouse-16.png

Rule-Based Transformation

With rule-based transformation, the main XSLT control elements are templates (<xsl:template>). A template consists of a head and a body. The head of each template specifies the context in which the template should be activated. This is done by specifying an attribute match with an XPath expression to select the relevant context nodes.

The template body describes what to do. This can be procedural XSLT instructions (see above). In addition, we may apply recursion with the instruction xsl:apply-templates, which applies all templates defined in the stylesheet to all nodes in the selected context.

The select attribute of xsl:apply-templates defines the context in which the templates are to be executed. select="." stands for the current context: the processor will try to match templates with the child elements of the current node.

In addition, xsl:apply-templates has an optional mode attribute. This introduces an additional selection mechanism for templates: only those templates that have a matching mode attribute in their head are applied.

The result of an xsl:apply-templates instruction can be sorted with an xsl:sort instruction. In addition, the results can be numbered with the xsl:number instruction.

If the heads of more than one template match a certain context, the template with the best match is selected for execution:

  • Templates in the current style sheet are selected over templates from imported style sheets.

  • The more specific a matching expression in the template head is, the better is the match.

  • In addition, it is possible to specify an explicit priority for a template.

Here is an example rule-based stylesheet that produces the same output as the previous procedural style sheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns="http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<!-- Make sure we generate HTML output -->
<xsl:output method="html" indent="yes"/>
<!-- The root node does the basic setup -->
<xsl:template match="/">
<!-- Generate HTML document root -->
<html><head/><body>
  <!-- Process all children of the root node -->
  <xsl:apply-templates select="album"/>
  <!-- Second pass for tracks -->
  <h4>Tracks</h4>
  <!-- Mode parameter allows to select templates -->
  <xsl:apply-templates select="album/track" mode="tracks"/>
</body></html>
</xsl:template>
<!-- Template for title -->
<xsl:template match="title">
  <h2><xsl:value-of select="."/></h2><br/>
</xsl:template>
<!-- Template for publisher -->
<xsl:template match="publisher">
  Publisher:
  <xsl:value-of select="."/><br/>
</xsl:template>
<!-- Template for albumNo -->
<xsl:template match="@albumNo">
  ProductNo:
  <xsl:value-of select="."/><br/>
</xsl:template>
<!-- Template for coverImage -->
<xsl:template match="coverImage">
    <img src="{.}" alt="{../title}"/>
</xsl:template>
<!-- Template for special tracks processing -->
<xsl:template match="track" mode="tracks">
  <!-- Print character content of track element -->
  <xsl:number format="1-"/>
  <xsl:value-of select="title"/>
  <!-- Convert duration to mm:ss format -->
  (<xsl:value-of select=
        "substring-before(substring-after(duration,'T'),'M')"/>:
   <xsl:value-of select=
        "substring-before(substring-after(duration,'M'),'S')"/>)<br/>
</xsl:template>
<!-- Dummy template to exclude tracks from first pass -->
<xsl:template match="track">
</xsl:template>
</xsl:stylesheet>

This stylesheet contains a separate rule for each element in the source document. The consequence is that the layout of the resulting HTML page is not determined by the stylesheet but by the XML source. The sequence of elements in the XML source triggers the execution of rules in the stylesheet. Rule-based stylesheets are therefore best used when the output document must closely match the structure of the source document.

In our example, there is one exception: To create an extra paragraph with tracks (and title it with "Tracks") we used a two-pass approach. In the first pass we convert everything except track elements; in the second pass we convert only track elements. The appropriate templates are selected via mode attributes.

Here is the resulting HTML:

<html>
   <head>
      <meta http-equiv="Content-Type"
            content="text/html; charset=utf-8">
   </head>
   <body>
      ProductNo:
      BGJ-47<br>
      <h2>Blues House Jam</h2><br>
      <img src="post-election-jam.jpg" alt="Blues House Jam">
      <h4>Tracks</h4>
      1-Post Election Jam I (19:35)<br>
      2-Post Election Jam II (20:35)<br>
   </body>
</html>

And the result as it appears in the browser:

graphics/blueshouse2-16.png

Limitations of XSLT

XSLT supports variables and parameters. However, XSLT variables are "read-only" variables: the value is assigned when the variable is defined and cannot be overwritten afterwards. Templates can specify formal parameters, too, so that it is possible to pass parameter values to templates. However, there is no way to return values to the caller. Basically, a template is stateless. XSLT is a functional language.

For programmers with a background in procedural programming this can make certain tasks difficult. Of course it is possible to mimic stateful behavior by making extensive use of recursive calls, but the stylesheets become hard to understand and execution requires a lot of memory.

In addition, XSLT does not have a complete set of built-in mathematical operators. For example, there are no trigonometric or logarithmic functions. This can be a disadvantage if, for example, we want to generate business graphics in SVG format. It is not impossible (one programmer succeeded in solving differential equations with XSLT!), but it is difficult.

Last but not least, the result of an XSLT style sheet transformation is always written to a single output stream. We cannot split output into several files (this issue is addressed in XSLT 1.1).

These limitations necessitate an extension mechanism, which XSLT fortunately provides. Several XSLT processors provide extensions, most notably Michael Kay's Saxon and the Apache Group's Xalan.

However, although the extension mechanism is standardized, the extensions themselves are not, so you have to choose a specific processor and stay with it. The good news is that there are community efforts to create a standard set of extensions: have a look at http://exslt.org/.

Using Style Sheets with Tamino

There are several ways to apply stylesheets to an XML document. The common way is to supply a pointer to a stylesheet within a processing instruction of an XML document, for example:

<?xml-stylesheet type="text/xsl" href="album.xsl"?>

This processing instruction causes the XML processor to apply the stylesheet album.xsl to the content of the XML document.

In many cases, the XML client is a web browser. This is fine as long as we have control over which web browsers are used (for example, in an intranet) and can guarantee that all clients understand XSLT 1.0. But on the Internet we can be quite sure that not all clients (for example PDAs) can handle XSLT, so the conversion from XML to HTML must be done on the server.

Tamino's serialization method, in combination with the XSLT server extension, offers exactly this functionality. Using serialization, a server extension call can be included in a query. The XSLT server extension, as described in the chapter Example: XSLT Server Extension of the server extension documentation, makes XSLT transformations of XML documents retrieved from Tamino, using stylesheets that are stored in Tamino.

For storing stylesheets, we first define a small schema for the stylesheet document type:

<?xml version = "1.0" encoding = "UTF-8"?>
<xs:schema xmlns:xs  = "http://www.w3.org/2001/XMLSchema"
  xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition"
  xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"
  targetNamespace = "http://www.w3.org/1999/XSL/Transform" >
  <xs:annotation>
    <xs:appinfo>
      <tsd:schemaInfo name = "stylesheet">
        <tsd:collection name = "encyclopedia"/>
        <tsd:doctype name = "xsl:stylesheet">
          <tsd:logical>
            <tsd:content>closed</tsd:content>
          </tsd:logical>
        </tsd:doctype>
      </tsd:schemaInfo>
    </xs:appinfo>
  </xs:annotation>
  <xs:element name = "stylesheet"/>
</xs:schema>

Note that we have defined a single untyped element with the name stylesheet. Accordingly, we have used the same name for the document type.

After we have defined this schema to Tamino, we can add stylesheets to our encyclopedia collection. To be able to identify these stylesheets later, we use the option to store a document instance under a particular document name (@ino:docname). This allows us to retrieve that document by its name via URL (see From Schema to Tamino::Object Identity).

The documentation for the SerializationSpec expression in the XQuery Reference Guide provides further information about the use of serialization.