Defining query input
In a query definition, you must specify an inputs block that defines at least one input. The input definitions identify the events that you want the query to operate on. An input definition can specify particular content and it can also specify a number of events or a time period. For example:
query FraudulentWithdrawalDetection {
inputs {
Withdrawal(amount > 10.0)
key cardNumber, cardType
within 600.0;
AddressChange()
key cardNumber, typeOfCard as cardType
retain 1;
}
find (Withdrawal as w1 -> Withdrawal as w2)
where (w1.country != w2.country or w1.city != w2.city)
without AddressChange as ac {
getAccountInfo();
if preferredContactType = "Email" {
sendEmail();
}
if preferredContactType = "SMS" {
sendSMS();
}
}
}
The previous code defines two inputs. For each input, there is an associated window of events. The first input window contains Withdrawal events and the second contains AddressChange events.
The input definition for the Withdrawal events specifies that each Withdrawal event in the window must have a value greater than 10.0 in the amount field. The input definition for the AddressChange events does not specify an event filter. Therefore, each AddressChange event that arrives is eligible to be in the window.
The next element in an input definition is the key definition. The key definition indicates how you want to partition the incoming events. If you define more than one input, the number, type and order of the key fields must be the same for each input. In the previous sample code, assume that the key fields for Withdrawal events, cardNumber and cardType are integer and string, respectively, and that the key fields for AddressChange events, cardNumber and typeOfCard are also integer and string, respectively. The two input keys match in number, type and order of key fields.
After the key definition, you can specify a within clause, a retain clause, or both. If you specify both, the within clause must be before the retain clause. A within clause specifies a period of time. Only events that arrive within that period of time are in the window. In the window that contains Withdrawal events, only Withdrawal events that have arrived in the last 10 minutes (600.0 seconds) are in the window. A retain clause specifies how many events can be in the window. In the window that contains AddressChange events, only the last AddressChange event that arrived can be in the window. When an AddressChange event arrives, if an AddressChange event is already in the window it is ejected.
After the duration, you can optionally specify a
with unique clause to prevent repeated values appearing in the window. If specified, the
with unique clause lists one or more fields or actions on the event type (action names should be followed by open and close parentheses). If there is more than one event in the window after the
within and
retain specifications, then all but the latest are discarded. See
Matching only the latest event for a given field.
The final, optional, element of an input definition is the specification of the event source timestamp and the associated wait period. If you expect that input events from a source will be subject to delays or may be received out of order, then you can specify a
time from clause with the name of an action that returns a float specifying the number of seconds from the epoch (midnight, 1 Jan 1970 UTC) that the event was created. If you do this, you must also add a
wait clause which requires a float or time literal specifying the maximum delay expected for these events. This tells the query runtime how long it must wait if it has not received any events before it can continue processing the events it has received. Both of these clauses require that the event definition must have a source timestamp recording the time of creation of the event, and a corresponding action that returns that timestamp in the form of a float representing the number of seconds since the epoch. In the example below, the query is gathering data from cars, which may be delayed if a vehicle goes out of network coverage. Accordingly, the input definitions specify that the source timestamps of the events are to be obtained from the events'
getEcuTime actions which simply return the value of the events'
ts float field. Further, the input definitions specify that in each case, the runtime should wait for up to 1 hour before continuing to process any events already received to allow for possible delays. For further details, see
Using source timestamps of events.
event CarRPM {
string carId;
float ts;
float rpm;
action getEcuTime() returns float {
return ts;
}
}
event CarEngineTemp {
string carId;
float ts;
float temp;
action getEcuTime() returns float {
return ts;
}
}
event CarEngineMisfire {
string carId;
float ts;
action getEcuTime() returns float {
return ts;
}
}
query DetectEnginePerformanceProblems {
inputs {
CarEngineTemp() key carId within 1 hour time from getEcuTime wait 1 hour;
CarRPM() key carId within 1 hour time from getEcuTime wait 1 hour;
CarEngineMisfire() key carId within 1 hour time from getEcuTime wait 1 hour;
}
find CarEngineTemp as t and CarRPM as r -> wait 1 minute
where t.temp > T_THRESHOLD
where r.rpm > R_THRESHOLD
without CarEngineMisfire as misfire {
log "Possible engine performance problem" + t.toString() + r.toString();
}
}
Typically, you define one to four inputs. If you define more than one input, each must be a different event type. In other words, two inputs to the same query cannot be the same event type.
Queries can share windows
All query instances that have the same input definitions share the same windows. Two queries have the same input definitions when they specify:
the same input event types (the order can be different)
the same keys
the same (if any) input filters
the same use of source timestamps - that is, the same action named in
time from clauses (wait times may be different)
the same use of heartbeat events
Any wait, within, retain or with unique specifications can be different.
When two query instances have the same input definitions and no parameters are used in any input filters, then all instances of those query definitions can share window data. If parameters are used in input filters, then parameterizations with different parameter values each store data separately. This increases total storage requirements and cost of processing the queries.
If a query is already running and you inject a query that defines the same inputs or create a parameterization that defines the same inputs then the new query instance or new parameterization uses the same windows as the query that was already running. This means that events that were received before the new query was injected or before the parameterization was created can be in a match set for the new query instance or new parameterization. This can happen when an event arrives after the new query is injected or after the parameterization is created and that event completes the pattern that the new instance or parameterization is looking for.
To reduce the amount of memory storage required to run queries, you might want to adjust the input definition for a query so that it is the same as another query. For example, suppose query Q is consuming inputs A, B, and X, while query P is consuming inputs A, B, and Y. If both queries define both X and Y as inputs (as well as A and B) then they can share the same windows. This can be an advantage when there are many A and B events but comparatively few X and Y events. If many queries can be written with similar input sections then they can share windows, which can lead to very efficient use of memory.
If the reason for adding an input using source timestamps is simply in order to share window contents, then the wait time for this input can be zero to avoid unnecessary delays.