MashZone NextGen 10.2 | Appendix | Legacy Presto components | Mashables and Mashups | Mashups in EMML | Advanced Mashup Techniques | Web Clipping with a Mashup Script
 
Web Clipping with a Mashup Script
Web clipping, also sometimes called screen scraping, allows you to treat the HTML from any URL as the result of a mashable that you can filter, combine or otherwise transform in a mashup. A web clipping mashup uses the <directinvoke> statement to retrieve HTML which the MashZone NextGen Server converts to XHTML, in the http://www.w3.org/1999/xhtml namespace.
Note: MashZone NextGen uses TagSoup to convert HTML to XHTML and ensure the response is well-formed. Because of this conversion, the result may not match the HTML source exactly.
In versions 3.7 and earlier, MashZone NextGen used JTidy for this conversion. If needed, you can choose JTidy for backwards compatibility. See Handling HTML Responses for an example.
You can then use this XHTML in the mashup script as a mashable response.
Example
This example uses the results of a Google query as a web clipping result to output specific links:
<mashup name="Ruby"
xmlns="http://www.openmashup.org/schemas/v1.0/EMML"
xmlns:res="http://www.myCompany.com/googleQuery"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<operation name="queryGoogle">
<output name="result" type="document">
<res:queriesxmlns:res="http://www.myCompany.com/googleQuery"/>
</output>
<variable name="uri" default="http://www.google.com/search?q=ruby"/>
<directinvoke outputvariable ="$searchresult" endpoint="$uri"/>
<foreach variable="query" items="$searchresult//xhtml:a[starts-with(@href, '/url?q=')]">
<appendresult outputvariable="$result">
<res:itemlink>
{attribute href {resolve-uri($query/@href, $uri)}}
</res:itemlink>
</appendresult>
</foreach>
</operation>
</mashup>
The XML result from this mashup looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<queries xmlns="http://www.myCompany.com/googleQuery">
<itemlink href="http://www.ruby-lang.org/"/>
<itemlink href="http://en.wikipedia.org/wiki/Ruby_programming_language"/>
<itemlink href="http://en.wikipedia.org/wiki/Ruby"/>
<itemlink href="http://www.rubyonrails.org/"/>
<itemlink href="http://www.rubycentral.com/"/>
<itemlink href="http://www.rubycentral.com/book/"/>
<itemlink href="http://www.w3.org/TR/ruby/"/>
<itemlink href="http://www.youtube.com/watch?v=JMDcOViViNY"/>
<itemlink href="http://www.zenspider.com/Languages/Ruby/QuickRef.html"/>
<itemlink href="http://poignantguide.net/"/>
...
</queries>

Copyright © 2013-2018 | Software AG, Darmstadt, Germany and/or Software AG USA, Inc., Reston, VA, USA, and/or its subsidiaries and/or its affiliates and/or their licensors.
Innovation Release