Configuration and Lifecycle Operations
Full example
The following code configures a client-side cache over a new clustered Dataset:
try (DatasetManager datasetManager = DatasetManager.clustered(clusterUri) // <1>
.withCache(DatasetManager.cacheConfiguration() // <2>
.heap(256, MemoryUnit.MB) // <3>
.offheap(2, MemoryUnit.GB)) // <4>
.build()) { // <5>
DatasetConfiguration ordersConfig = datasetManager.datasetConfiguration() // <6>
.offheap("offheap-resource-name") // <7>
.disk("disk-resource-name") // <8>
.build(); // <9>
try (Dataset orders =
datasetManager.createDataset("orders", Type.LONG, ordersConfig)) { // <10>
}
}
1 | The static method DatasetManager.clustered starts the process of configuring a clustered DatasetManager. It returns a DatasetManagerBuilder which allows configuration of the cluster client. |
2 | The withCache method on the DatasetManagerBuilder configures the cluster client to use client side caching. A CacheConfigurationBuilder is passed to withCache which is constructed using the static method DatasetManager.cacheConfiguration. |
3 | The client side cache is configured to use 256MB of heap memory. |
4 | The client side cache is configured to use 2GB of offheap memory. |
5 | The configuration of the DatasetManager is now completed and an instance of DatasetManager is created. This instance represents a connection to the cluster. DatasetManager is AutoCloseable so try-with-resources should be used. |
6 | A DatasetConfiguration is required to create a new Dataset. A DatasetConfigurationBuilder that can be used to construct a DatasetConfiguration is acquired using the method datasetConfiguration on the DatasetManager. Note that a DatasetConfiguration should be used with the DatasetManager that was used to create it. |
7 | A server side offheap resource is specified for data to be held in. Note that the name supplied must match the name of an offheap resource configured on the server. |
8 | A server side disk resource is specified for data to be held in. Note that the name supplied must match the name of a disk resource configured on the server. |
9 | The specification of the DatasetConfiguration is now completed and an instance is created. |
10 | A new Dataset called orders is created. It has a key of type LONG. Dataset is AutoCloseable so try-with-resources should be used. |
URI to connect to server
The cluster URI takes the form of:
terracotta://<server1>:<port>,<server2>:<port>
for example:
terracotta://tcstore1:9510,tcstore2:9510
where tcstore1 and tcstore2 are the names of the servers that form the cluster.
Caching
Caches can be configured at the DatasetManager level, as shown in the full example above, in which case the cache is shared across any Dataset managed by that DatasetManager. Alternatively a cache can be configured for a specific Dataset using a variant of getDataset:
try (DatasetManager datasetManager = DatasetManager.clustered(clusterUri).build();
Dataset orders = datasetManager.getDataset(
"orders", Type.LONG, DatasetManager.cacheConfiguration()
.heap(256, MemoryUnit.MB)
.offheap(2, MemoryUnit.GB))) {
}
Caching is optional. If you want to use the TCStore API with no cache then do not call withCache and use the variant of getDataset that does not take a CacheConfigurationBuilder.
Cache tiers
The TCStore API has the concept of offheap storage where data is stored in memory that is outside the JVM's control. The advantage of this is that lots of data can be stored without impacting garbage collection. The memory is managed directly by TCStore. However, there is a trade-off: access times for data in offheap are longer because the data must be brought into the Java heap before it can be used. In practice, for larger caches, configuring offheap storage improves performance.
A cache need not have any offheap storage configured, in which case all cached records will be held in the Java heap. This is more suitable for smaller caches.
If a cache is configured with both heap and offheap as in the full example above, the Java heap holds the hottest records, less-hot records will be held in offheap.
Configuring a Dataset
When a Dataset is created, the name of the dataset and the type of the key must be specified. These are the first two parameters to createDataset and the same values should be used to later access the same Dataset via getDataset.
The third parameter is a DatasetConfiguration which specifies how storage for the Dataset should be managed on the server.
When the server is configured, any offheap memory resources or filesystem directories in which data can be written are given names. Any string passed to offheap or disk should match the name of a resource configured on the server. This resource will then be used for storage for the Dataset.
A Dataset must have an offheap resource configured for it. If the disk resource is specified then the records of the Dataset will be recorded on disk. If no disk resource is specified, then data is held just in the memory of the servers of the cluster.
Note on the fluent API
TCStore uses a fluent API to allow configuration calls to be chained. Following this pattern, each call returns a builder so that further configuration can be made, however, TCStore returns a different instance each time. This allows a DatasetManagerBuilder to be used as a prototype for different configurations, but this means that code such as:
ClusteredDatasetManagerBuilder builder = DatasetManager.clustered(clusterUri);
builder.withCache(cacheConfiguration);
DatasetManager datasetManager = builder.build();
will create a clustered DatasetManager that has no client side cache because the build is called on the wrong object.
Instead use the following form:
ClusteredDatasetManagerBuilder builder = DatasetManager.clustered(clusterUri);
ClusteredDatasetManagerBuilder cachedBuilder = builder.withCache(cacheConfiguration);
DatasetManager datasetManager = cachedBuilder.build();
or more fluently:
DatasetManager datasetManager = DatasetManager.clustered(clusterUri)
.withCache(cacheConfiguration)
.build();