Best Practices and Configuration Considerations
Deployment
The BigMemory WAN Replication Service should be run on a secure, trusted network.
It is recommended to have a dedicated server for each region's Orchestrator, equipped to handle CPU- and network-heavy processes. While the hardware recommendation is similar to that of an L2 Terracotta server, the Orchestrator's server will not need as much RAM because it will be storing less data. Hardware sizing should take into account data element count, element size, frequency of element update, and bandwidth and latency across the WAN.
It is recommended to have least one additional Orchestrator process in each region, to serve as a failover for the master Orchestrator in that region. Note that if the region contains a replica Orchestrator, then adding a standby replica Orchestrator is not always beneficial, because this can cause some slowness on Master caches during failover of the replica Orchestrator. For more information see the section
Automatic Recovery Scenarios.
Master caches should be located in the region where the most writes occur. This will give the lowest number of conflicts and highest possible performance.
If possible, locate all of your Master caches in the same region.
Avoid deployments where there is no terminal region. If your deployment has a cyclic dependency of Masters and Replicas, it would result in at least one Master cache going down if any given region went down, and lead to longer delays when recovering.
The deployment that offers the best performance and consistency would have:
Distinct caches per region, where Master caches have most of the writes, and Replicas are effectively read-only or read/write-rarely.
Distinct sets of cache keys for each region. Non-overlapping sets of cache keys between regions avoids conflicts because the cache mutability is isolated. For example, configure a counter for each region instead of using a global counter.
Non-simultaneous updates of shared caches and keys. This means keys are updated across the WAN, but the application ensures they don't occur simultaneously.
To support bi-directional use cases, recommend use of distinct caches (only updated in specific region), then replicated to other region
Configuration Rules and Limitations
The following should be taken into account when configuring the components for WAN replication:
WAN Replication does not provide data compression
Orchestrator cannot be enabled to use off-heap
Cache cannot be split over multiple Orchestrators
Cache events are not replicated. The resulting element state after a cache operation is replicated.
Write-behind queues are not replicated.
Explicitly locked caches will still function, but the lock will only be enforced locally.
The bulk-load state is not replicated, because it is not possible to tell from one region that another region is in bulk-load mode. Bulk loading will otherwise work as normal.
The following cache configurations are not supported:
Transactional caches
Non-Ehcache caches
Unclustered (standalone) caches
TSAs with different segment counts
Cache element expiration is only performed by the Master cache, and Orchestrators enforce Master cache expiries on all Replica caches. This means that Time To Idle (TTI) settings may not achieve expected results. Only the Master will perform last accessed time updates, so even if there are gets from the Replica regions, they will not count toward resetting the idle countdown on the element. It is recommended to avoid using TTI-based expiration with WAN replication.
If you had configured cache size settings for Replica caches, they will be overridden so that the Master can control the cache.
Configuration Best Practices
To improve performance, consider the following:
Divide data into 2 caches
Use Orchestrator per cache
Use 2 stripes if feasible (with high data volumes)
Above is a recommendation - more effective in environments with higher volumes
Cache configurations should be as symmetrical as possible for the same cache in different regions. The cache size, as well as other settings that can impact the eviction of an element, should be the same across regions. These settings include the Time-To-Live option and the
maxEntriesInCache attribute (configured in the
ehcache.xml file), and the
dataStorage and
offheap elements (configured in the
tc-config.xml file). For more information, see "Automatic Resource Management" in the
BigMemory Max Administrator Guide.
In order to see exceptions on a client when it fails to connect to an Orchestrator, you must set the client's nonstop
timeoutBehavior type to "exception"; otherwise, you will only get the INFO message "Waiting for the Orchestrator to mark it active" in the event of an Orchestrator failure, or when a Replica gets disconnected from its Master. For more information, see "Configuring Nonstop Operation" in the
BigMemory Max Configuration Guide.
Avoid repeated and sustained conflicting writes to a set of elements over the WAN. This will result in constant conflict resolution and lower performance.
Removing all elements should be avoided where possible. Caches may be cleared, but the Orchestrator never disposes of a cache. Even if all clients dispose of a cache, the Orchestrator will continue to hold it, and so it cannot be destroyed while the Orchestrator is running. It is recommended to avoid calling
cache.clear() in your application.
For a region that is used for disaster recovery, it is recommended to enable the Fast Restart feature. For more information, see "Configuring Fast Restart" in the
BigMemory Max Configuration Guide.