Presto Analytics : Getting Started with Presto Analytics : Load Data with <directinvoke> and Filter Rows
Load Data with <directinvoke> and Filter Rows
Accessing files can be useful, but in most cases the datasets you want to work with come from databases or from other systems or applications. If applications provide a REST or Web Service interface, you can access and load data using <directinvoke> and the appropriate URL.
We’re going to load a CSV dataset with information on global manufacturing plants that is accessible from http://mdc.jackbe.com/prestodocs/data/mfgplants.csv. Then we will use the Where clause to filter the rows the mashup should work with.
1. Login to Presto and select Mashups > RAQL Explorer in the main menu.
This is the RAQL Explorer that you can use to explore RAQL queries. You can use this tool to play with results and queries when your dataset is accessible as a file or from a URL. See Explore RAQL with the Presto RAQL Explorer for more information about this tool.
2. Enter http://mdc.jackbe.com/prestodocs/data/mfgplants.csv as the Data Source.
RAQL Explorer automatically detects the Datatype as CSV. In rare cases, you may need to manually set the datatype.
3. Enter plants as the Alias.
4. Then enter this query to see the first 20 rows of the result:
select * from plants limit 20
5. Click Run.
The results, showing the first 20 rows in this dataset display in a grid, something like this:
As the results show, this contains a list of manufacturing plants, by country and name, along with latitude/longitude information and statistics on the production lines at each site.
Let’s add a Where clause to change the RAQL query and select a specific country. Change the query to:
select * from plants limit 20 where Country in (’BELGIUM’)
The results, shown here, now return rows for Belgium only:
Syntax points to keep in mind for this query:
*The * wildcard in the Select clause works just like SQL to retrieve all columns for the dataset.
*The From clause identifies the name of the variable for this dataset stream. No path is required because the CSV format of the data already fits the RAQL data model (see Structure, Format and Access to Datasets with RAQL for more information).
*The Where clause shown here is a simple condition based on one column that comparison the column against a literal country name. See Literal Values in Conditions or Expressions for more information on using literal values in queries.
Where clauses:
*Can also have several conditions linked by the standard logical operators: and and or. See Where Complex Conditions for examples and information.
*Can use basic mathematical comparison operators as well as operators for text comparisons. See RAQL Operators for a complete list and Where Complex Conditions for examples and information.
*Cannot currently use the standard SQL syntax for parameters in Where clause conditions, but you can use mashup expressions instead or you can build the full query itself dynamically. See Parameters in Where Clauses and Creating Dynamic RAQL Queries for examples and instructions.
If you save this query as a mashup, the mashup should look something like this:
<mashup name="CSVFilter"
xmlns="http://www.openmashup.org/schemas/v1.0/EMML"
xmlns:macro="http://www.openmashup.org/schemas/v1.0/EMMLMacro"
xmlns:presto="http://www.jackbe.com/v1.0/EMMLPrestoExtensions"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openmashup.org/schemas/v1.0/EMML/../schemas/EMMLPrestoSpec.xsd">
<operation name="Invoke">
<presto:presto-meta name="created-using">RUI</presto:presto-meta>
<presto:presto-meta name="alias">plants</presto:presto-meta>
<presto:presto-meta name="endpoint-url">http://mdc.jackbe.com/prestodocs/data/mfgplants.csv</presto:presto-meta>
<output name="result" type="document"/>
<variable name="plants" subType="csv" type="document"/>
<directinvoke outputvariable="plants" stream="true" endpoint="http://mdc.jackbe.com/prestodocs/data/mfgplants.csv" />
<raql outputvariable="result">
select * from plants where Country in ('BELGIUM')
</raql>
</operation>
</mashup>
Some points to note in the EMML code for this mashup:
*This explicitly declares a variable plants for the results from the <directinvoke> statements. The type for this variable is document with a subtype of csv.The csv subtype is only valid for dataset streams.
If you simply declare an output variable on <directinvoke> (an implicit variable), EMML creates a document-type variable which accepts only XML data and would cause an error in this case. You must explicitly declare variables to hold dataset streams.
*The <directinvoke> statement has a stream attribute set to true to indicate that the results should be treated as a dataset stream.
*The <presto:presto-meta> elements are added by RAQL Explorer, but are not required for the mashup to run.
Copyright © 2006-2015 Software AG, Darmstadt, Germany.

Product LogoContact Support   |   Community   |   Feedback