Mashups in EMML : Advanced Mashup Techniques : Web Clipping with a Mashup Script
Web Clipping with a Mashup Script
Web clipping, also sometimes called screen scraping, allows you to treat the HTML from any URL as the result of a mashable that you can filter, combine or otherwise transform in a mashup. A web clipping mashup uses the <directinvoke> statement to retrieve HTML which the Presto Server converts to XHTML, in the http://www.w3.org/1999/xhtml namespace.
Note:  
Presto uses TagSoup to convert HTML to XHTML and ensure the response is well-formed. Because of this conversion, the result may not match the HTML source exactly.
In versions 3.7 and earlier, Presto used JTidy for this conversion. If needed, you can choose JTidy for backwards compatibility. See Handling HTML Responses for an example.
You can then use this XHTML in the mashup script as a mashable response.
Example
This example uses the results of a Google query as a web clipping result to output specific links:
<mashup name="Ruby"
xmlns="http://www.openmashup.org/schemas/v1.0/EMML"
xmlns:res="http://www.myCompany.com/googleQuery"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<operation name="queryGoogle">
<output name="result" type="document">
<res:queriesxmlns:res="http://www.myCompany.com/googleQuery"/>
</output>
<variable name="uri" default="http://www.google.com/search?q=ruby"/>
<directinvoke outputvariable = "$searchresult" endpoint="$uri"/>
<foreach variable="query" items="$searchresult//xhtml:a[starts-with(@href, '/url?q=')]">
<appendresult outputvariable="$result">
<res:itemlink>
{attribute href {resolve-uri($query/@href, $uri)}}
</res:itemlink>
</appendresult>
</foreach>
</operation>
</mashup>
The XML result from this mashup looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<queries xmlns="http://www.myCompany.com/googleQuery">
<itemlink href="http://www.ruby-lang.org/"/>
<itemlink href="http://en.wikipedia.org/wiki/Ruby_programming_language"/>
<itemlink href="http://en.wikipedia.org/wiki/Ruby"/>
<itemlink href="http://www.rubyonrails.org/"/>
<itemlink href="http://www.rubycentral.com/"/>
<itemlink href="http://www.rubycentral.com/book/"/>
<itemlink href="http://www.w3.org/TR/ruby/"/>
<itemlink href="http://www.youtube.com/watch?v=JMDcOViViNY"/>
<itemlink href="http://www.zenspider.com/Languages/Ruby/QuickRef.html"/>
<itemlink href="http://poignantguide.net/"/>
...
</queries>
Copyright © 2006-2015 Software AG, Darmstadt, Germany.

Product LogoContact Support   |   Community   |   Feedback