Supported Data Formats for RAQL
RAQL can perform queries on datasets in the following formats:
CSV (comma-separated values):
The first line in a CSV dataset must identify column names for the data.
RAQL replaces any white space character in column names with an underscore (_). For example, "First Name" becomes First_Name.
Column names that are numeric have "column_" prepended to the name. For example, "2010" becomes "column_2010".
JDBC Result Sets: returned from databases when SQL queries or stored procedures are invoked.
XML: data must be well-formed. In addition,
RAQL has the following limitations for XML data:
XML namespaces are ignored.
The structure of the XML should be flat, with a single set of repeating nodes (the rows) that contain a single level of elements (the columns) with simple content (text only). Data in any nodes that are ancestors of the repeating 'rows' is not accessible.
For example:
<records>
<record>
<itemId>N2390</itemId>
<price>145.20</price>
...
</record>
<record>
<itemId>G88</itemId>
<price>16.95</price>
...
</record>
...
</records>
Data in attributes may not be accessible in some situations.
JSON: data must be well-formed. The structure of the JSON should be flat, with a single array of objects (the rows) that contain name/value pairs (the columns) with simple content (number, string, boolean). Data in any objects that are ancestors of the repeating 'rows' is not accessible.
For example:
{
"records": {
"record": [
{
"itemId": "N2390",
"price": 145.2,
...
},
{
"itemId": "G88",
"price": 16.95,
...
},
...
]
}
}
Assuming that the above JSON data is available in file sales.json, the following EMML sample executes RAQL on it:
<mashup xmlns='http://www.openmashup.org/schemas/v1.0/EMML'>
<variable name='sales'
datafile='sales.json'
type='document'
subtype='json'
stream='true'/>
<output name='result'
type='document' />
<raql outputvariable='result'
query='select itemId, price from sales/records/record'/>
</mashup>
Java Objects: loaded in
In-Memory Stores by external systems. Java objects must:
Be plain Java objects or beans with properties for each column of data in the dataset.
Be serializable. This is required when
In-Memory Stores use both local memory for the
MashZone NextGen Server and memory from additional
BigMemory hosts. See
In-Memory Dataset Management for more information.
Have search attributes defined in the configuration for the declared
In-Memory Store where they will be stored. Search attributes provide the extraction class and other information that maps Java object properties to dataset columns and allows
RAQL to access and work with the data.