Clustered DatasetManager using the API
Full example
The following code configures a new clustered Dataset with a system configured default persistent storage engine:
try (DatasetManager datasetManager = DatasetManager.clustered(clusterUri) // 1
.build()) { // 2
DatasetConfiguration ordersConfig =
datasetManager.datasetConfiguration() // 3
.offheap(offHeapResourceId) // 4
.disk(diskResourceId) // 5
.build(); // 6
datasetManager.newDataset("orders", Type.LONG, ordersConfig); // 7
try (Dataset orders =
datasetManager.getDataset("orders", Type.LONG)) { // 8
// Use the Dataset
}
}
1 | The static method DatasetManager.clustered starts the process of configuring a clustered DatasetManager. It returns a DatasetManagerBuilder which allows configuration of the cluster client. |
2 | The DatasetManager is created, represents a connection to the cluster. DatasetManager is AutoCloseable so try-with-resources should be used. |
3 | A DatasetConfiguration is required to create a new Dataset. A DatasetConfigurationBuilder that can be used to construct a DatasetConfiguration is acquired using the method datasetConfiguration on the DatasetManager. Note that a DatasetConfiguration should be used with the DatasetManager that was used to create it. |
4 | A server side offheap resource is specified for data to be held in. Note that the name supplied must match the name of an offheap resource configured on the server. |
5 | A server side disk resource is specified for data to be held in. Note that the name supplied must match the name of a disk resource configured on the server. As illustrated in the two examples above, an optional persistent storage engine parameter can be specified along with the disk resource, denoting the underlying persistent storage engine technology that needs to used for this dataset. See the section Note on the supported persistent storage engine
technologies below for a discussion on currently supported persistent storage engine technologies. |
6 | The specification of the DatasetConfiguration is now completed and an instance is created. |
7 | A new Dataset called orders is created. It has a key of type LONG. |
8 | The previously created dataset is retrieved. Dataset is AutoCloseable so try-with-resources should be used. |
URI to connect to server
The cluster URI takes the form of:
terracotta://<server1>:<port>,<server2>:<port>
for example:
terracotta://tcstore1:9510,tcstore2:9510
where tcstore1 and tcstore2 are the names of the servers that form the cluster.
Configuring a Dataset
When a Dataset is created, the name of the dataset and the type of the key must be specified. These are the first two parameters to createDataset and the same values should be used to later access the same Dataset via getDataset.
The third parameter is a DatasetConfiguration which specifies how storage for the Dataset should be managed on the server.
When the server is configured, any offheap memory resources or filesystem directories in which data can be written are given names. Any string passed to offheap or disk should match the name of a resource configured on the server. This resource will then be used for storage for the Dataset.
A Dataset must have an offheap resource configured for it. If the disk resource is specified then the records of the Dataset will be recorded on disk. If no disk resource is specified, then data is held just in the memory of the servers of the cluster.
A Dataset must have an offheap resource configured for it. If the disk resource is specified then the records of the Dataset will be recorded on disk. If the disk resource is specified, the persistent storage engine technology used to persist on disk can also be optionally specified. If no persistent storage engine is specified, the default persistent storage engine will be used. If no disk resource is specified, then data is held just in the memory of the servers of the cluster.
Note on the supported persistent storage engine technologies
The currently supported persistent storage engine are as follows:
PersistentStorageEngine.FRS
PersistentStorageEngine.HYBRID
The persistent storage engines vary in the rules on how the persistent store is used. However all persistent storage engines provides strong guarantees on data being non-volatile and durable across server crashes and restarts.
1. FRS
The Heap (or offheap) is pushed to the FRS instance on disk, but processing is served from memory.
Primary, secondary, and heap and offheap memory structures are rebuilt from FRS on restart.
Primary and secondary indexes are rebuilt from scratch in memory on restart.
2. HYBRID also uses FRS technology underneath with the following caveats.
Here the Heap is pushed to FRS.
The in-memory heap is merely a mapping to locate a value given a key from the disk.
Record lookups are served by asking the storage engine.
Primary, secondary structures reside in-memory
In the future, new persistent storage engine technologies could be supported and allowed to be configured for a dataset.
Current Limitations when configuring persistent storage engines
There are some limitations on how these storage engines can be configured against a dataset. In the future one or more of these limitations may be lifted.
A given disk resource can only hold a single storage engine. This means two datasets using the same disk resource must specify the same storage engine technology.
The current default persistent storage engine, if none is specified when dataset is configured, is FRS. Again if a dataset uses the disk resource with the default storage engine, another dataset using the same dataset must use the same storage engine.
Note on the fluent API
TCStore uses a fluent API to allow configuration calls to be chained. Following this pattern, each call returns a builder so that further configuration can be made, however, TCStore returns a different instance each time. This allows a DatasetManagerBuilder to be used as a prototype for different configurations, but this means that code such as:
ClusteredDatasetManagerBuilder builder = DatasetManager.clustered(clusterUri);
builder.withConnectionTimeout(30, TimeUnit.SECONDS);
DatasetManager datasetManager = builder.build();
will create a clustered DatasetManager that has the default connection timeout because build is called on the wrong object.
Instead use the following form:
ClusteredDatasetManagerBuilder builder = DatasetManager.clustered(clusterUri);
ClusteredDatasetManagerBuilder configuredBuilder =
builder.withConnectionTimeout(30, TimeUnit.SECONDS);
DatasetManager datasetManager = configuredBuilder.build();
or more fluently:
DatasetManager datasetManager = DatasetManager.clustered(clusterUri)
.withConnectionTimeout(30, TimeUnit.SECONDS)
.build();