Use or Construct XPath Expressions for Generic Macros
You use the <type> and <xpath> metadata elements within <presto:macro-meta> to have Wires force users to select nodes from the Path Selection list for an input parameter and assign the XPath expression for the selected node to a macro input parameter. The next example shows the basics of this metadata:
<macro name="helloWorld"
xmlns:presto="http://www.jackbe.com/v1.0/EMMLPrestoExtensions">
<presto:macro-meta>
<block usages="Wires"> ... </block>
<parameters>
<parameter name="aPath">
<type datatype="path">
<xpath limitTo="$source" usage="leaf"/>
</type>
</parameter>
</parameters>
</presto:macro-meta>
<output name="macroResult" type="document"/>
<input name="source" type="document"/>
<input name="aPath" type="string"/>
...
<sort />
</macro>
For more information on this metadata, see
Get the Chosen Path as Text.
To use the XPath expression from the input parameter, you generally need to include the name of the input parameter that contains the document-type results where this path is taken from. You may also need to provide additional information to get the complete XPath expression you need to use in your macro.
For examples, see:
Modifying XPath Expressions to Handle Generic Loops
In this generic macro example, the macro loops through a repeating set of leaf nodes to convert dates from an invalid format into valid dates that can be used in a mashup for sorting or in date calculations. This example is meant to be used with information sources that have dates as a string of numbers with no delimiters, such as 20100904.
Users identify the set of repeating leaf nodes with dates that should be converted in the pathToDate input parameter. This path is then used in a <foreach> statement in the macro to perform the conversion.
The items attribute of <foreach> expects an XPath expression in the form $variable-name/path-within-the-variable. The pathToDate input parameter, however, contains just /path-within-the-variable.
To handle this, the XPath expression in
items uses a dynamic mashup expression (see
Dynamic Mashup Expressions) to construct the full path needed using the path passed in
pathToDate:
<macro name="ToDate"
xmlns:presto="http://www.jackbe.com/v1.0/EMMLPrestoExtensions">
<presto:macro-meta>
...
</presto:macro-meta>
<output name="macroResult" type="document"/>
<input name="source" type="document"/>
<input name="pathToDate" type="string"/>
<input name="dateFormat" type="string"/>
<foreach items="$source{$pathToDate}" variable="aDate">
<variable name="year" type="string"/>
<variable name="month" types="string"/>
<variable name="day" type="string"/>
<if condition="$dateFormat='1'">
<assign fromexpr="substring($aDate/text(),1,4)" outputvariable="$year"/>
<assign fromexpr="substring($aDate/text(),5,2)"
outputvariable="$month"/>
<assign fromexpr="substring($aDate/text(),7,2)" outputvariable="$day"/>
<elseif condition="$dateFormat='2'">
<assign fromexpr="substring($aDate/text(),1,2)"
outputvariable="$month"/>
<assign fromexpr="substring($aDate/text(),3,2)"
outputvariable="$day"/>
<assign fromexpr="substring($aDate/text(),5,4)" outputvariable="$year"/>
</elseif>
<else>
<assign fromexpr="substring($aDate/text(),1,2)" outputvariable="$day"/>
<assign fromexpr="substring($aDate/text(),3,2)"
outputvariable="$month"/>
<assign fromexpr="substring($aDate/text(),5,4)" outputvariable="$year"/>
</else>
</if>
<assign fromexpr="concat($year,'-',$month,'-',$day)"
toexpr="$aDate"/>
</foreach>
<assign fromvariable="$source" outputvariable="$macroResult"/>
</macro>
The loop then deconstructs each date, based on the format identified in the dateFormat input parameter, reconstructs a date in the ISO 8601 format that EMML uses and updates the date field with the reconstructed date.
Custom Block Properties and Metadata for Generic Loop Example
The ToDate example shown in the generic loop macro has three input parameters:
source: to receive the document with dates to convert.
pathToDate: to pass the XPath expression for the nodes that users choose that identifies the repeating dates nodes to convert.
dateFormat: a choice of date formats to identify the format of the incoming dates.
To ensure that users select fields with data for pathToDate and that what is passed is the XPath expression to the nodes they select, you supply metadata in <presto:macro-meta>, such as this:
<macro name="ToDate"
xmlns:presto="http://www.jackbe.com/v1.0/EMMLPrestoExtensions">
<presto:macro-meta>
<block usage="Wires">
<label>Convert to Date</label>
</block>
<parameters>
<parameter name="source">
<label>Choose block results</label>
</parameter>
<parameter name="pathToDate">
<label>Choose the repeating date fields to convert</label>
<type datatype="path">
<xpath limitTo="$source" usage="leaf"/>
</type>
</parameter>
<parameter name="dateFormat">
<label>Choose the format currently used for this date</label>
<type datatype="enum">
<list>
<option label="YYYYMMDD">1</option>
<option label="MMDDYYYY">2</option>
<option label="DDMMYYYY">3</option>
</list>
</type>
</parameter>
</parameters>
</presto:macro-meta>
<output name="macroResult" type="document"/>
<input name="source" type="document"/>
<input name="pathToDate" type="string"/>
<input name="dateFormat" type="string"/>
...
</macro>
In this example, <type datatype="path"> indicates that the pathToDate input parameter should receive the XPath expression rather than the value of the selected node. The <xpath> statement identifies the results that users must choose from in the Path Selection list for pathToDate:
For more information and links on the metadata that you can use to configure block properties for custom blocks, see
Configure Properties for Custom Blocks.
Modifying XPath Expressions to Enable Generic Extraction
The built-in MashZone NextGen Extract block allows users to extract data values or nodes primarily based on their position in results. In many cases, however, you need more sophisticated logic to extract nodes based on node names or attribute values.
This generic macro example allows users to extract nodes from HTML results based on the value of class names in the HTML. A common use for this is to allow users to use web clipping to retrieve information from web sites.
In this example, the Direct Invoke block retrieves an HTML page. The Extract From HTML custom block, based on this example macro, has three input parameters to identify the block results to extract nodes from, to identify a parent or ancestor that contains all the nodes that users want to extract and to identify the class name that should be used to extract nodes.
The code for this example, along with the metadata that configures the custom block properties in Wires, is:
<macro name="extractByClass"
xmlns:presto="http://www.jackbe.com/v1.0/EMMLPrestoExtensions">
<presto:macro-meta>
<block usage="Wires"><label>Extract From HTML</label></block>
<parameters>
<parameter name="htmlDoc">
<label>HTML Results</label>
<help>Connect a block with HTML content</help>
</parameter>
<parameter name="path">
<label>Choose Repeating Nodes</label>
<help>Choose the repeating ancestor for the nodes to extract</help>
<type datatype="path">
<xpath limitTo="$htmlDoc" usage="array"/>
</type>
</parameter>
<parameter name="extractClass">
<label>Class to Extract</label>
<help>Enter the class that identifies the nodes to extract</help>
</parameter>
</parameters>
</presto:macro-meta>
<output name="macroResult" type="document"/>
<input name="htmlDoc" type="document"/>
<input name="path" type="string"/>
<input name="extractClass" type="string"/>
<variable name="temp" type="document">
<extracted/>
</variable>
<foreach items="$htmlDoc{$path}//*[contains(@class,'{$extractClass}')]"
variable="record">
<appendresult outputvariable="$temp">
<record>{$record}</record>
</appendresult>
</foreach>
<assign fromvariable="$temp" outputvariable="$macroResult"/>
</macro>
The macro uses a dynamic mashup expression in the <foreach> loop to combine both the XPath expression from the
path input parameter and the class name from
extractClass to identify the HTML nodes to extract. For more information, see
Dynamic Mashup Expressions.
The macro metadata for
path ensures that the block property in
Wires forces users to select a repeating node from the HTML results in
htmlDoc. This property is assigned the XPath expression to the node that users select. For more information on this metadata configuration, see
Get the Chosen Path as Text.