Load Data with <directinvoke> and Filter Rows
Accessing files can be useful, but in most cases the datasets you want to work with come from databases or from other systems or applications. If applications provide a REST or Web Service interface, you can access and load data using <directinvoke> and the appropriate URL.
We’re going to load a CSV dataset with information on global manufacturing plants that is accessible from http://mdc.jackbe.com/prestodocs/data/mfgplants.csv. Then we will use the Where clause to filter the rows the mashup should work with.
1. Login to Presto and select Mashups > RAQL Explorer in the main menu.
This is the
RAQL Explorer that you can use to explore
RAQL queries. You can use this tool to play with results and queries when your dataset is accessible as a file or from a URL. See
Explore
RAQL
with the
Presto
RAQL
Explorer for more information about this tool.
2. Enter http://mdc.jackbe.com/prestodocs/data/mfgplants.csv as the Data Source.
RAQL Explorer automatically detects the Datatype as CSV. In rare cases, you may need to manually set the datatype.
3. Enter plants as the Alias.
4. Then enter this query to see the first 20 rows of the result:
select * from plants limit 20
5. Click Run.
The results, showing the first 20 rows in this dataset display in a grid, something like this:
As the results show, this contains a list of manufacturing plants, by country and name, along with latitude/longitude information and statistics on the production lines at each site.
Let’s add a Where clause to change the RAQL query and select a specific country. Change the query to:
select * from plants limit 20 where Country in (’BELGIUM’)
The results, shown here, now return rows for Belgium only:
Syntax points to keep in mind for this query:
The
* wildcard in the Select clause works just like SQL to retrieve all columns for the dataset.
The Where clause shown here is a simple condition based on one column that comparison the column against a literal country name. See
Literal Values in Conditions or Expressions for more information on using literal values in queries.
Where clauses:
Can also have several conditions linked by the standard logical operators:
and and
or. See
Where Complex Conditions for examples and information.
If you save this query as a mashup, the mashup should look something like this:
<mashup name="CSVFilter"
xmlns="http://www.openmashup.org/schemas/v1.0/EMML"
xmlns:macro="http://www.openmashup.org/schemas/v1.0/EMMLMacro"
xmlns:presto="http://www.jackbe.com/v1.0/EMMLPrestoExtensions"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openmashup.org/schemas/v1.0/EMML/../schemas/EMMLPrestoSpec.xsd">
<operation name="Invoke">
<presto:presto-meta name="created-using">RUI</presto:presto-meta>
<presto:presto-meta name="alias">plants</presto:presto-meta>
<presto:presto-meta name="endpoint-url">http://mdc.jackbe.com/prestodocs/data/mfgplants.csv</presto:presto-meta>
<output name="result" type="document"/>
<variable name="plants" subType="csv" type="document"/>
<directinvoke outputvariable="plants" stream="true" endpoint="http://mdc.jackbe.com/prestodocs/data/mfgplants.csv" />
<raql outputvariable="result">
select * from plants where Country in ('BELGIUM')
</raql>
</operation>
</mashup>
Some points to note in the EMML code for this mashup:
This explicitly declares a variable
plants for the results from the
<directinvoke> statements. The type for this variable is
document with a subtype of
csv.The
csv subtype is only valid for dataset streams.
If you simply declare an output variable on <directinvoke> (an implicit variable), EMML creates a document-type variable which accepts only XML data and would cause an error in this case. You must explicitly declare variables to hold dataset streams.
The
<directinvoke> statement has a
stream attribute set to
true to indicate that the results should be treated as a dataset stream.
The
<presto:presto-meta> elements are added by
RAQL Explorer, but are not required for the mashup to run.