Overview of the distributed MemoryStore

The MemoryStore supports several types of stores as described in Using the MemoryStore. In addition to those stores that are local to a single Apama process, Apama also supports a distributed store in which data can be accessed by applications running in multiple correlators. You prepare a distributed store with a prepareDistributed call on the Storage interface. When this sends a Finished event with success set to true, the Store can be opened, and Table objects created.

A distributed store makes use of Terracotta's TCStore or BigMemory Max, or a third-party distributed cache or datagrid technology that stores the data (table contents) in memory across a number of processes (nodes), typically across a number of machines. The collection of nodes is termed a cluster.

Arranging a number of nodes into a cluster provides the following advantages:

As the data is in memory, a distributed store is typically faster than persisting the store contents to disk.

Every piece of data is typically stored on more than one node, so the failure of any one node should not cause the loss of any committed data.

If a node fails, other nodes can access any of the data without waiting to recover or reload the entire datastore. Note, however, that it may take time to detect that the failed node is down.

The number of correlators can be changed at runtime, allowing the processing capacity of the system to be increased.

Different providers can be used, allowing a single Apama application to integrate with different distributed caches. However, each provider must have a driver. Apama provides a Service Programming Interface (SPI) with which you can write a custom driver (see Creating a distributed MemoryStore driver for more information).

Data is accessible to multiple correlators. If they distribute workload appropriately, more processing capacity can use the same shared store of data. A distributed store is a building block for such a system, not a complete solution in itself.

Applications can be notified of changes to data in the store (see Notifications for more information).

A distributed store has the following disadvantages compared with the other types of store:

A network request may be required to get or commit any Row. This is slower than the in-process local-memory get and commit requests made against local stores.

The network request may fail because either more than one node has failed or there is a network failure such that the correlator cannot contact other nodes in the cluster.

Multiple access to a single row will cause contention and will not scale (and will be slower than an in-memory store).

It is not permitted to expose DataViews with a distributed store. A distributed store may contain a very large number of entries, which would not be practical to expose as DataViews (as it requires storing a copy of the entire table in the dashboards/Scenario Service client).

Based on the advantages and disadvantages of distributed stores, the typical use cases for using them are:

A highly available system needs continuous access to data, even if some nodes fail, and with minimal recovery time.

High throughput is required across a large number of different rows, with only a small amount of contention for a single row.

The typical use cases where a distributed store is not suitable are:

Very high throughput (more than 10,000 requests per second) to a single row. The distributed store only scales out well if different rows are being accessed.

Apama includes drivers for connecting to Terracotta's TCStore and BigMemory Max, which provide unlimited in-memory data management across distributed servers. See TCStore (Terracotta) driver details and BigMemory Max driver details for more information.

Apama also provides an interface to integrate with third-party distributed caching software that provides compare-and-swap operations for adding, updating, and removing data (for example, software that provides methods similar to the putIfAbsent, replace, and remove operations on java.util.concurrent.ConcurrentMap).

For other distributed cache providers, you need to write a driver using the Apama Service Provider Interface (SPI) to serve as a bridge between the MemoryStore and the caching software. For information on creating a driver, see Creating a distributed MemoryStore driver.

In order to use a distributed MemoryStore, a set of configuration files must be created in your project and provided to the correlator. These configuration files typically come in pairs: a storeName-spring.xml file and a storeName-spring.properties file. Multiple pairs of files can be created and can make use of more than one distributed cache provider. See Configuring a distributed store.