Best Practices and Configuration Considerations

It is recommended to have a dedicated server for each region's Orchestrator, equipped to handle CPU- and network-heavy processes. While the hardware recommendation is similar to that of an L2 Terracotta server, the Orchestrator's server will not need as much RAM because it will be storing less data. Hardware sizing should take into account data element count, element size, frequency of element update, and bandwidth and latency across the WAN.

It is recommended to have least one additional Orchestrator process in each region, to serve as a failover for the master Orchestrator in that region. Note that if the region contains a replica Orchestrator, then adding a standby replica Orchestrator is not always beneficial, because this can cause some slowness on Master caches during failover of the replica Orchestrator. For more information see the section Automatic Recovery Scenarios.

Master caches should be located in the region where the most writes occur. This will give the lowest number of conflicts and highest possible performance.

If possible, locate all of your Master caches in the same region.

Avoid deployments where there is no terminal region. If your deployment has a cyclic dependency of Masters and Replicas, it would result in at least one Master cache going down if any given region went down, and lead to longer delays when recovering.

The deployment that offers the best performance and consistency would have:

Distinct caches per region, where Master caches have most of the writes, and Replicas are effectively read-only or read/write-rarely.

Distinct sets of cache keys for each region. Non-overlapping sets of cache keys between regions avoids conflicts because the cache mutability is isolated. For example, configure a counter for each region instead of using a global counter.

Non-simultaneous updates of shared caches and keys. This means keys are updated across the WAN, but the application ensures they don't occur simultaneously.

To support bi-directional use cases, recommend use of distinct caches (only updated in specific region), then replicated to other region

WAN Replication does not provide data compression

Orchestrator cannot be enabled to use off-heap

Cache cannot be split over multiple Orchestrators

Cache events are not replicated. The resulting element state after a cache operation is replicated.

Write-behind queues are not replicated.

Explicitly locked caches will still function, but the lock will only be enforced locally.

The bulk-load state is not replicated, because it is not possible to tell from one region that another region is in bulk-load mode. Bulk loading will otherwise work as normal.

If you are using custom value classes that are not JDK types, the classes must be included in the Orchestrator's classpath. Refer to Step 5 in the section General Steps for Setting Up WAN Replication.

The following cache configurations are not supported:

Cache element expiration is only performed by the Master cache, and Orchestrators enforce Master cache expiries on all Replica caches. This means that Time To Idle (TTI) settings may not achieve expected results. Only the Master will perform last accessed time updates, so even if there are gets from the Replica regions, they will not count toward resetting the idle countdown on the element. It is recommended to avoid using TTI-based expiration with WAN replication.

If you had configured cache size settings for Replica caches, they will be overridden so that the Master can control the cache.

To improve performance, consider the following:

Cache configurations should be as symmetrical as possible for the same cache in different regions. The cache size, as well as other settings that can impact the eviction of an element, should be the same across regions. These settings include the Time-To-Live option and the maxEntriesInCache attribute (configured in the ehcache.xml file), and the dataStorage and offheap elements (configured in the tc-config.xml file). For more information, see "Automatic Resource Management" in the BigMemory Max Administrator Guide.

In order to see exceptions on a client when it fails to connect to an Orchestrator, you must set the client's nonstop timeoutBehavior type to "exception"; otherwise, you will only get the INFO message "Waiting for the Orchestrator to mark it active" in the event of an Orchestrator failure, or when a Replica gets disconnected from its Master. For more information, see "Configuring Nonstop Operation" in the BigMemory Max Configuration Guide.

Avoid repeated and sustained conflicting writes to a set of elements over the WAN. This will result in constant conflict resolution and lower performance.

Removing all elements should be avoided where possible. Caches may be cleared, but the Orchestrator never disposes of a cache. Even if all clients dispose of a cache, the Orchestrator will continue to hold it, and so it cannot be destroyed while the Orchestrator is running. It is recommended to avoid calling cache.clear() in your application.

For a region that is used for disaster recovery, it is recommended to enable the Fast Restart feature. For more information, see "Configuring Fast Restart" in the BigMemory Max Configuration Guide.