Connection Pooling
Overview
The use of connection pooling is common among applications accessing relational databases through the Java Database Connectivity (JDBC) API.
Through a connection pool, application threads borrow a connection to the database for the duration of some unit of work and then return the connection to the pool for use by another application thread. This is done to avoid the overhead of establishing the connection to the database for each unit of work the application performs. A connection pool is frequently used by an application deployed as a servlet in a servlet engine with each servlet request performing a single unit of work.
So why not use a single JDBC connection for all requests? Technically, each java.sql.Connection implementation is thread-safe but many, if not most, implementations achieve this thread-safety through method synchronization - effectively single-threading operations using a single connection. Perhaps more importantly, JDBC transactions are scoped with the Connection - a commit by any thread using a Connection commits all activity using that Connection. So, sharing a Connection among application threads requires the application to coordinate its work and tolerate the single-threaded processing of its requests.
While TCStore doesn't expose a connection object through its API, there are benefits to sharing some of the API objects among application threads. Among the TCStore API objects which should be considered for sharing are the DatasetManager and Dataset objects. Most TCStore API objects and methods are thread-safe without resorting to high-level synchronization. Under TCStore, each mutation performed on a Dataset is atomic and committed individually so the need for separate TCStore "connections" to address operation atomicity is not applicable.
DatasetManager
Note: Obtaining a new ClusteredDatasetManger instance for each application unit of work will result in poor application performance.
A DatasetManager instance is as close as TCStore comes to having a connection object. A DatasetManager is the object through which an application gains access to and manages TCStore datasets. To interact with datasets residing in a TSA, an application needs a ClusteredDatasetManager instance. Creating a ClusteredDatasetManager instance (using DatasetManager.clustered(uri).build() ) is a fairly expensive operation involving the creation of TCP connections (at least two per stripe) along with several exchanges between client and servers. Fortunately, a ClusteredDatasetManager holds no state related to operations against the dataset manager or datasets it manages - it is safe to share among application threads.
From the perspective of the Terracotta Management Console (TMC) each ClusteredDatasetManager instance is a client. If you require more granular visibility into your application thread operations, you should consider using separate ClusteredDatasetManager instances that correspond to the required granularity.
In addition to the expense of creating a ClusteredDatasetManager instance, most of the methods on a ClusteredDatasetManager instance are also fairly expensive to perform and should not be performed frequently. These operations are expensive not only in the amount of time required to complete the operations but on the impact on overall server performance.
Given the client and server resources consumed by a ClusteredDatasetManager instance, the instance should be closed when it's no longer needed. But, if a DatasetManager instance is shared among application threads, the DatasetManager.close() method should not be invoked unless and until all operations on Dataset instances obtained from the DatasetManager instance are complete - calling close may abruptly terminate in-progress operations.
Dataset
Note: Using DatasetManager.getDataset to obtain reference to a Dataset for each application unit of work will result in poor application performance.
The TCStore Dataset object is the application's entry point to reading from, writing to, and managing indexes on a dataset. An application creates a dataset using a call to DatasetManager.newDataset(…) and obtains a reference to a previously created dataset using DatasetManager.getDataset(…). As mentioned above, each of these operations is somewhat time consuming and should not be done frequently. And, once created, a persistent dataset cannot be created again until it is destroyed so the new Dataset operation need not be repeated routinely.
To gain access to an already-created dataset, use the DatasetManager.getDataset(…) method. Again, this method is expensive and should not be repeated for every application unit of work. As with a ClusteredDatasetManager instance, a Dataset instance holds no state related to operations so it's safe to share Dataset instances among application threads.
Compared with DatasetManager.getDataset(…), the methods on a Dataset instance are relatively inexpensive and can be performed in each application unit of work. However, the Indexing instance returned by the Dataset.indexing() method should not be used for routine operations. Creating an index and deleting an index are potentially expensive operations - applications should notadd an index to a dataset, perform processing using that index, and then remove that index.
Maintaining a client-side reference to a Dataset is not without server-side cost. As with a ClusteredDatasetManager, a Dataset instance should be closed when no longer needed. Again like a ClusteredDatasetManager, if a Dataset instance is shared among application threads, the Dataset.close() method should not be invoked unless and until all operations on that Dataset instance are complete - calling close may abruptly terminate in-progress operations.
Other TCStore Objects
In addition to the objects mentioned above, there are many other objects in the TCStore API. While these objects generally need not be part of pooling strategies, their existence must be taken into account when considering a pooling strategy. For example, each of these objects is derived directly or indirectly from a Dataset instance - when using a pooled Dataset instance, operations on these objects must be complete before returning the Dataset instance to the pool. No references to any object obtained directly or indirectly from a pooled Dataset instance should be retained after returning the Dataset instance to the pool.
DatasetReader and DatasetWriterReader The DatasetReader object, obtained using the Dataset.reader() method, and the DatasetWriterReader object, obtained using the Dataset.writerReader() method, are thread-safe objects providing read-only and read/write access to the Dataset instance from which each was obtained. While one might consider pooling these objects, obtaining an instance is an inexpensive operation - the added complication of pooling instances of these objects would not be worth the trouble. In addition, operations against a DatasetReader or DatasetWriterReader should be completed before returning a Dataset instance to the pool.
RecordStream and MutableRecordStream The RecordStream and MutableRecordStream objects, obtained from the DatasetReader.records() and DatasetWriterReader.records() methods, respectively, are the roots of the dataset bulk processing API based on Java streams. As with the DatasetReader or DatasetWriterReader instance from which it was obtained, a RecordStream or MutableRecordStream instance should not be retained or operated upon beyond the return of the pooled Dataset through which the stream instance was obtained. To ensure proper operation, stream instances must be closed before the pooled Dataset is returned. Additionally, Iterator and Spliterator instances obtained from a RecordStream or MutableRecordStream must not be retained or operated upon after returning the pooled Dataset. If you want to use stream results as input to other work units, you must either drain the stream into a local data structure (which is then used to feed other work units) or use a non-pooled DatasetManager instance and Dataset instance having a lifecycle compatible with the lifetime of the stream.
ReadRecordAccessor and ReadWriteRecordAccessor The ReadRecordAccessor and ReadWriteRecordAccessor extend key-based operations with conditional execution and CAS capabilities. These are obtained using the on(…) methods of the DatasetReader and DatsetWriterReader objects. Like the other objects in this group, operations performed using a ReadRecordAccessor or a ReadWriteRecordAccess must be complete before returning the Dataset instance from which they were obtained to the pool.
AsyncDatasetReader and AsyncDatasetWriterReader Obtained using the async() methods of a DatasetReader or DatasetWriterReader instance, the AsyncDatasetReader and AsyncDatasetWriterReader objects provide non-blocking access to the TCStore API. In general, the methods on each return an Operation instance, implementing both the java.util.concurrent.CompletionStage and java.util.concurrent.Future interfaces, providing a full range of asynchronous task completion options. As with the objects discussed above, these operations should be completed before returning a Dataset instance to the pool.
Pooling Strategies
When seeking to improve application performance through resource pooling, the general recommendations are to:
1. Obtain one or more ClusteredDatasetManager instances.
a. Pre-obtain during application initialization or defer until needed as appropriate for the application.
b. Use no more than the number of ClusteredDatasetManager instances required to handle the application load.
c. Do not "pool" the ClusteredDatasetManager instances in the traditional way - most applications do not need access to a DatasetManager instance so there's no need to "share out" a ClusteredDatasetManager instance. Instead, the ClusteredDatasetManager instances are used internally by the pooling implementation to support obtaining Dataset instances.
d. Track Dataset instances obtained from each ClusteredDatasetManager instance; when no Dataset instance obtained from a ClusteredDatasetManger remains open, the ClusteredDatasetManager instance is idle and may be closed. Keeping around idle ClusteredDatasetManger instances for a certain amount of time may be appropriate for the application.
2. Pool Dataset instances for sharing - Dataset references can and should be shared.
a. Use a strategy appropriate for your application to either pre-obtain a core set of Dataset instances during application initialization or defer allocation until demanded.
b. Obtain no more than one (1) Dataset instance for a given dataset (name/type) per ClusteredDatasetManager.
c. Share Dataset instances by reference count. If appropriate for the application, a Dataset instance having no uses and left idle for some period of time should be closed.
d. Pair each Dataset instance with the ClusteredDatasetManager through which it was allocated.
Note: If a
StoreReconnectFailedException is raised for a TCStore operation, the
ClusteredDatasetManager instance from which the object on which that operation was performed is disabled and must be discarded along with any
Dataset instances obtained from that
ClusteredDatasetManager. Once an object obtained from a
ClusteredDatasetManager instance throws a
StoreReconnectFailedException, all subsequent operations for that
ClusteredDatasetManager instance will also throw a
StoreReconnectFailedException. For pool management, the failing
ClusteredDatasetManager instance must be replaced with a new instance. See
Clustered Reconnection for details.