Overview of the distributed MemoryStore
The MemoryStore supports several types of stores as described in "Using the MemoryStore". In addition to those stores that are local to a single Apama process, Apama also supports a distributed store in which data can be accessed by applications running in multiple correlators. You prepare a distributed store with a prepareDistributed call on a Storage object. When this sends a Finished event with success set to true, the Store can be opened, and Table objects created.
A distributed store makes use of Terracotta's BigMemory Max or a third party distributed cache or datagrid technology that stores the data (table contents) in memory across a number of processes (nodes), typically across a number of machines. The collection of nodes is termed a cluster.
Advantages
Arranging a number of nodes into a cluster provides the following advantages:
It is possible to store more data than would fit on one node.
As the data is in memory, a distributed store is typically faster than persisting the store contents to disk.
Every piece of data is typically stored on more than one node, so the failure of any one node should not cause the loss of any committed data.
If a node fails, other nodes can access any of the data without waiting to 'recover' or reload the entire datastore. Note, however, that it may take time to detect that the failed node is down.
The number of correlators can be changed at runtime, allowing the processing capacity of the system to be increased.
Different providers can be used, allowing a single Apama application to integrate with different distributed caches. However, each provider must have a driver. Apama provides a Service Programming Interface (SPI) with which you can write a custom driver.
Data is accessible to multiple correlators; if they distribute workload appropriately, more processing capacity can use the same shared store of data. A distributed store is a building block for such a system, not a complete solution in itself.
Applications can be notified of changes to data in the store; see
Notifications.
Disadvantages
A distributed store has the following disadvantages compared with the other types of store:
A network request may be required to get or commit any
Row; this is slower than the in-process local-memory get and commit requests made against local stores.
The network request may fail because either more than one node has failed, or there is a network failure such that the correlator cannot contact other nodes in the cluster.
Multiple access to a single row will cause contention and will not scale (and will be slower than an in-memory store).
It is not permitted to expose dataviews with a distributed store. A distributed store may contain a very large number of entries, which would not be practical to expose as dataviews (as it requires storing a copy of the entire table in the dashboards/ scenario service client).
Use cases
Based on the advantages and disadvantages of distributed stores, the typical use cases for using them are:
Requires more data to be stored than will fit on any single node.
Elastic (changing) processing capacity required.
Highly available system needs continuous access to data, even if some nodes fail, and with minimal recovery time.
High throughput across a large number of different rows, with only a small amount of contention for a single row.
The typical use cases where a distributed store is not suitable:
Very low latency (sub-millisecond) access to data.
Very high throughput (>10,000 requests/second) to a single row - the distributed store only scales out well if different rows are being accessed.
Supported providers
Apama includes a driver for connecting to Terracotta BigMemory Max, which provides unlimited in-memory data management across distributed servers. See
BigMemory Max driver specific details for using the BigMemory Max driver.
Apama also provides an interface to integrate with third-party distributed caching software that provides compare-and-swap operations for adding, updating, and removing data. For example, software that provides methods similar to the putIfAbsent, replace, and remove operations on java.util.concurrent.ConcurrentMap.
For other distributed cache providers, you need to write a driver using the Apama Service Provider Interface (SPI) to serve as a bridge between the MemoryStore and the caching software. For information on creating a driver, see
Creating a distributed MemoryStore driver.
Configuration
In order to use a distributed memory store, a set of configuration files must be created in your project and provided to the correlator. These configuration files typically come in pairs, a
.properties and
-spring.xml. Multiple pairs of files can be created and can make use of more than one distributed cache provider. See
Configuring a distributed store.