The "official" method for transforming XML into other formats (often presentation formats) is XSLT (eXtensible Stylesheet Language: Transformations). Historically, XSLT had been a part of the XSL (eXtensible Stylesheet Language) specification, but XSL was split into three parts: XPath, XSLT, and XSL Formatting Objects (XSL-FO). XSL-FO was designed as the presentation format for XML. However, it plays currently only a minor role since most of its functionality is covered by HTML+CSS. XSL-FO is usually an intermediate step when generating PDF from an XML document.
XSLT is now a recommendation in its own right. It enables style-sheet controlled transformations from one XML document format into another document format, which can be either XML or non-XML. XSLT can, for example, be used to transform presentation-neutral XML data into presentation formats such as HTML, XHTML, XForms, WML, SMIL, SVG, etc. In the chapter From Conceptual Model to Schema::Integrity we already discussed other applications for XSLT, such as constraint checking and generating XML Schema from XMI.
Although XSLT is quite powerful, it has some deficiencies that have led to the development of various extensions. Also, programmers who are familiar with imperative languages such as Java or C++ sometimes find it hard to think in XSLT's rule-based structures. For the transformation into HTML, however, most of the XSLT coding can be avoided by the use of XSLT generators, which allow visual construction of the resulting web page (or visual mapping of XML elements to HTML elements of an existing web page) and generate most of the required XSLT code. Examples of such generators are Altova's XML Spy, eXcelon's Stylus and Whitehill's XSL Composer.
Such tools are useful to develop stylesheets that map XML documents onto individual HTML pages. However, when we want to create generic transformations (for example, where the final layout depends on the document type and/or on the content), or when we need stylesheets to produce output other than HTML, we have to dig into XSLT programming. In the following sections we give a short introduction.
The basic construct in XSLT are templates. Each XSLT stylesheet must
consist of at least one template. A template can be explicitly invoked by name,
or it can be implicitly applied via pattern matching according to the
match
expression defined in the head of the template. This allows
two programming styles in XSLT which can be mixed freely, namely rule-based
programming and procedural programming.
Rule-based programming
This is a more declarative approach. Rules (i.e. templates with a
match
expression) specify which elements of the input document
they apply to, and how they transform these elements. Rules are applied
recursively. The programmer describes the transformation in terms of logic and
is not concerned with the sequence of execution.
Procedural programming
This programming style is easier to understand for programmers with experience in imperative languages such as Java or C. The programmer describes to the XSLT processor exactly what to do and in which sequence. The XSLT style sheet looks very much like the target document, with interspersed XSLT instructions to fill in the blanks.
To support procedural programming, XSLT provides the following operations:
Control structures.
XSLT instructions such as xsl:for-each
,
xsl:if
, and xsl:choose
provide procedural control
structures for loops, conditional execution and case structures. The result of
an xsl:for-each
instruction can be sorted with an
xsl:sort
instruction and numbered with the xsl:number
instruction.
The instruction xsl:call-template
is used to invoke a
template by name (recursive calls are possible). Parameters can be passed to
the invoked template but it is not possible to return results to the
caller.
The instruction xsl:apply-templates
can be used to start
rule-based processing (see Rule-Based
Transformation).
Accessing content.
The xsl:value-of
instruction writes the content of a node
or node list to the output stream as text. The xsl:copy-of
instruction writes the content of a node or node list to the output stream in
its original form.
Here is an XSLT example that transforms album
instances
into an HTML page. We have extended the album
schema from the
chapter
From
Conceptual Model to Schema::From Model to
Schema to include some more information:
Schema | Instance |
---|---|
<?xml version="1.0" encoding = "UTF-8"?> <?xml-stylesheet type="text/xsl" href="album.xsl"?> <album xmlns="http://www.softwareag.com/tamino/doc/ examples/models/jazz/encyclopedia" albumNo="BGJ-47"> <title>Blues House Jam</title> <track> <title>Post Election Jam I</title> <duration>PT19M35S</duration> </track> <track> <title>Post Election Jam II</title> <duration>PT20M35S</duration> </track> <coverImage> post-election-jam.jpg </coverImage> </album> |
The stylesheet programming is strictly procedural and deterministic. It is the stylesheet that defines the layout of the resulting HTML file.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns="http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- Make sure we generate HTML output --> <xsl:output method="html" indent="yes"/> <!-- Just a single rule for the root node --> <xsl:template match="/"> <!-- Generate HTML document root --> <html><head/><body> <!-- Select album node --> <xsl:for-each select="album"> <!-- The usual nested tables --> <table><tr><td> <table width="100%"> <tr bgcolor="silver"> <td> <!-- Title element as headline --> <h2><xsl:value-of select="title"/></h2><br/> <!-- Test if we have a publisher element --> <xsl:if test="publisher"> <!-- if yes generate publisher entry --> Publisher: <xsl:value-of select="publisher"/><br/> </xsl:if> <!-- Generate album number entry --> AlbumNo: <xsl:value-of select="@albumNo"/> </td> <!-- Test if we have a cover image --> <xsl:if test="coverImage"> <!-- if yes generate image reference --> <td> <img src="{coverImage}" alt="{title}"/> </td> </xsl:if> </tr> </table> </td></tr> <tr><td> <!-- now do the tracks --> <br/><h4>Tracks</h4> <table width="100%" > <!-- We may have multiple tracks, therefore loop --> <xsl:for-each select="track"> <tr bgcolor="silver"> <td> <!-- Print track number --> <xsl:number value="position()" format="1-"/> <!-- Print title of track element --> <xsl:value-of select="title"/> </td> <!-- Print duration --> <td align="Right"> <!-- Convert duration to mm:ss format --> <xsl:value-of select= "substring-before(substring-after(duration,'T'),'M')"/>: <xsl:value-of select= "substring-before(substring-after(duration,'M'),'S')"/> </td> </tr> </xsl:for-each> </table> </td></tr></table> </xsl:for-each><br/> </body></html> </xsl:template> </xsl:stylesheet>
To implement the stylesheet logic we have used the XSLT instructions
discussed above. Optional elements are included in an
<xsl:if>
block to suppress the decoration (such as
"Publisher:") if there is no publisher
element.
Applying this stylesheet to the above XML document instance results in the following HTML file:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <table> <tr> <td> <table width="100%"> <tr bgcolor="silver"> <td> <h2>Blues House Jam</h2><br> ProductNo: BGJ-47 </td> <td><img src="post-election-jam.jpg" alt="Blues House Jam"> </td> </tr> </table> </td> </tr> <tr> <td><br><h4>Tracks</h4> <table width="100%"> <tr bgcolor="silver"> <td>1-Post Election Jam I</td> <td align="Right">19:35</td> </tr> <tr bgcolor="silver"> <td>2-Post Election Jam II</td> <td align="Right">20:35</td> </tr> </table> </td> </tr> </table><br></body> </html>
The final representation in a web browser looks like this:
With rule-based transformation, the main XSLT control elements are
templates (<xsl:template>
). A template consists of a head
and a body. The head of each template specifies the context in which the
template should be activated. This is done by specifying an attribute
match
with an XPath expression to select the relevant context
nodes.
The template body describes what to do. This can be procedural XSLT
instructions (see above). In addition, we may apply recursion with the
instruction xsl:apply-templates
, which applies all templates
defined in the stylesheet to all nodes in the selected context.
The select
attribute of
xsl:apply-templates
defines the context in which the templates are
to be executed. select="."
stands for the current context: the
processor will try to match templates with the child elements of the current
node.
In addition, xsl:apply-templates
has an optional
mode
attribute. This introduces an additional selection mechanism
for templates: only those templates that have a matching mode attribute in
their head are applied.
The result of an xsl:apply-templates
instruction can be
sorted with an xsl:sort
instruction. In addition, the results can
be numbered with the xsl:number
instruction.
If the heads of more than one template match a certain context, the template with the best match is selected for execution:
Templates in the current style sheet are selected over templates from imported style sheets.
The more specific a matching expression in the template head is, the better is the match.
In addition, it is possible to specify an explicit priority for a template.
Here is an example rule-based stylesheet that produces the same output as the previous procedural style sheet:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns="http://www.softwareag.com/tamino/doc/examples/models/jazz/encyclopedia" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <!-- Make sure we generate HTML output --> <xsl:output method="html" indent="yes"/> <!-- The root node does the basic setup --> <xsl:template match="/"> <!-- Generate HTML document root --> <html><head/><body> <!-- Process all children of the root node --> <xsl:apply-templates select="album"/> <!-- Second pass for tracks --> <h4>Tracks</h4> <!-- Mode parameter allows to select templates --> <xsl:apply-templates select="album/track" mode="tracks"/> </body></html> </xsl:template>
<!-- Template for title --> <xsl:template match="title"> <h2><xsl:value-of select="."/></h2><br/> </xsl:template>
<!-- Template for publisher --> <xsl:template match="publisher"> Publisher: <xsl:value-of select="."/><br/> </xsl:template>
<!-- Template for albumNo --> <xsl:template match="@albumNo"> ProductNo: <xsl:value-of select="."/><br/> </xsl:template>
<!-- Template for coverImage --> <xsl:template match="coverImage"> <img src="{.}" alt="{../title}"/> </xsl:template>
<!-- Template for special tracks processing --> <xsl:template match="track" mode="tracks"> <!-- Print character content of track element --> <xsl:number format="1-"/> <xsl:value-of select="title"/> <!-- Convert duration to mm:ss format --> (<xsl:value-of select= "substring-before(substring-after(duration,'T'),'M')"/>: <xsl:value-of select= "substring-before(substring-after(duration,'M'),'S')"/>)<br/> </xsl:template>
<!-- Dummy template to exclude tracks from first pass --> <xsl:template match="track"> </xsl:template>
</xsl:stylesheet>
This stylesheet contains a separate rule for each element in the source document. The consequence is that the layout of the resulting HTML page is not determined by the stylesheet but by the XML source. The sequence of elements in the XML source triggers the execution of rules in the stylesheet. Rule-based stylesheets are therefore best used when the output document must closely match the structure of the source document.
In our example, there is one exception: To create an extra paragraph
with tracks (and title it with "Tracks") we used a
two-pass approach. In the first pass we convert everything except
track
elements; in the second pass we convert only
track
elements. The appropriate templates are selected via
mode
attributes.
Here is the resulting HTML:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> ProductNo: BGJ-47<br> <h2>Blues House Jam</h2><br> <img src="post-election-jam.jpg" alt="Blues House Jam"> <h4>Tracks</h4> 1-Post Election Jam I (19:35)<br> 2-Post Election Jam II (20:35)<br> </body> </html>
And the result as it appears in the browser:
XSLT supports variables and parameters. However, XSLT variables are "read-only" variables: the value is assigned when the variable is defined and cannot be overwritten afterwards. Templates can specify formal parameters, too, so that it is possible to pass parameter values to templates. However, there is no way to return values to the caller. Basically, a template is stateless. XSLT is a functional language.
For programmers with a background in procedural programming this can make certain tasks difficult. Of course it is possible to mimic stateful behavior by making extensive use of recursive calls, but the stylesheets become hard to understand and execution requires a lot of memory.
In addition, XSLT does not have a complete set of built-in mathematical operators. For example, there are no trigonometric or logarithmic functions. This can be a disadvantage if, for example, we want to generate business graphics in SVG format. It is not impossible (one programmer succeeded in solving differential equations with XSLT!), but it is difficult.
Last but not least, the result of an XSLT style sheet transformation is always written to a single output stream. We cannot split output into several files (this issue is addressed in XSLT 1.1).
These limitations necessitate an extension mechanism, which XSLT fortunately provides. Several XSLT processors provide extensions, most notably Michael Kay's Saxon and the Apache Group's Xalan.
However, although the extension mechanism is standardized, the extensions themselves are not, so you have to choose a specific processor and stay with it.
There are several ways to apply stylesheets to an XML document. The common way is to supply a pointer to a stylesheet within a processing instruction of an XML document, for example:
<?xml-stylesheet type="text/xsl" href="album.xsl"?>
This processing instruction causes the XML processor to apply the stylesheet album.xsl to the content of the XML document.
In many cases, the XML client is a web browser. This is fine as long as we have control over which web browsers are used (for example, in an intranet) and can guarantee that all clients understand XSLT 1.0. But on the Internet we can be quite sure that not all clients (for example PDAs) can handle XSLT, so the conversion from XML to HTML must be done on the server.
Tamino's serialization method, in combination with the XSLT server extension, offers exactly this functionality. Using serialization, a server extension call can be included in a query. The XSLT server extension, as described in the chapter Example: XSLT Server Extension of the server extension documentation, makes XSLT transformations of XML documents retrieved from Tamino, using stylesheets that are stored in Tamino.
For storing stylesheets, we first define a small schema for the stylesheet document type:
<?xml version = "1.0" encoding = "UTF-8"?> <xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema" xmlns:tsd = "http://namespaces.softwareag.com/tamino/TaminoSchemaDefinition" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform" targetNamespace = "http://www.w3.org/1999/XSL/Transform" > <xs:annotation> <xs:appinfo> <tsd:schemaInfo name = "stylesheet"> <tsd:collection name = "encyclopedia"/> <tsd:doctype name = "xsl:stylesheet"> <tsd:logical> <tsd:content>closed</tsd:content> </tsd:logical> </tsd:doctype> </tsd:schemaInfo> </xs:appinfo> </xs:annotation> <xs:element name = "stylesheet"/> </xs:schema>
Note that we have defined a single untyped element with the name
stylesheet
. Accordingly, we have used the same name for the
document type.
After we have defined this schema to Tamino,
we can add stylesheets to our encyclopedia
collection. To be able
to identify these stylesheets later, we use the option to store a document
instance under a particular document name (@ino:docname
). This
allows us to retrieve that document by its name via URL (see
From Schema to
Tamino::Object Identity).
The documentation for the SerializationSpec expression in the XQuery Reference Guide provides further information about the use of serialization.