Indexes
The records stored in a dataset are accessed for CRUD operations using the key against which the record is held. However, for stream queries there is an option to use secondary indexes for better query performance. Secondary indexes can be created on a specific Cell, thus all the records having that cell will be indexed. The queries on the indexed cell will try to use the index for optimized results.
The code snippet provided below depicts how to create/destroy indexes.
DatasetManager datasetManager = DatasetManager.clustered(clusterUri).build();
DatasetConfiguration configuration = datasetManager.datasetConfiguration()
.offheap("offheap-resource-name")
.index(CellDefinition.define("orderId", Type.STRING),
IndexSettings.BTREE) // <1>
.build();
Dataset<Long> dataset =
datasetManager.createDataset("indexedOrders", Type.LONG, configuration);
Indexing indexing = dataset.getIndexing(); // <2>
Operation<Index<Integer>> indexOperation =
indexing.createIndex(CellDefinition.define("invoieId", Type.INT),
IndexSettings.BTREE); // <3>
Index<Integer> invoiceIdIndex = indexOperation.get(); // <4>
indexing.getAllIndexes(); // <5>
indexing.getLiveIndexes(); // <6>
indexing.destroyIndex(invoiceIdIndex); // <7>
Creating Secondary Indexes
1 | An Index can be created while the dataset is being created. The DatasetConfigurationBuilder#index takes a CellDefinition and an IndexSettings. Currently only IndexSettings#BTREE is supported for secondary indexes. |
2 | In case there is a need to index a cell after dataset is created, that can be done as well. For that, Indexing is provided by Dataset#getIndexing to create/delete indexes on a dataset. |
3 | The Indexing.createIndex method again takes a CellDefinition and an IndexSettings, to return an Operation of Index. Operation represents the asynchronous execution of the long running indexing operation. |
4 | You get an Index when the operation completes. |
Getting Index Status
5 | Indexing#getAllIndexes returns all the indexes created on the dataset, regardless of their status. |
6 | Indexing#getLiveIndexes returns only those indexes whose Status is LIVE. |
Destroying Indexes
7 | An existing Index can be destroyed using Indexing#destroyIndex. |
Indexes in HA setup
Creating an index is a long running operation. With an HA setup, indexes are created asynchronously on the mirrors. This implies that if an index creation has completed and the status is LIVE, the index creation might still be in progress on mirrors which might complete eventually. Also when a new mirror comes up, the records on the active are synced to mirror, but they are indexed only when syncing of data is complete. Thus indexing on a new mirror is done asynchronously.
Please refer to API documentation for more details.