Import-Export Tool
The Import-Export Tool is executed by running the appropriate import-export-tool script located in the tools/import-export/bin folder inside the Terracotta installation directory:
import-export-tool.bat - used on Windows platforms
import-export-tool.sh - used on Unix/Linux platforms.
Import-Export Tool script executions utilize the following syntax:
import-export-tool.sh|bat [<common_options>] <command> <command_specific_options>
Note that the common_options are optional, while one or more of the command_specific_options are required.
The supported commands include:
Command | Description |
| Export records from a dataset into a parquet file. |
| Export records from a dataset into a TSON file. |
| Import records from a TSON file into a dataset. |
Common Command Options
Each Import-Export Tool command supports a unique set of options (detailed in the sections throughout this document). All commands support the following common options:
Option | Description | Default |
-connection-timeout | Timeout value for connections to be established. | 10 seconds |
-security-dir | Specifies the location of the security root directory folder. Used to communicate with a server that is configured with any of the supported security schemes (e.g. TLS/SSL). For more details on configuring security in a Terracotta cluster, see
Security Core Concepts and
Cluster Security. | |
-verbose | Generates a verbose output. Useful for debugging error conditions. | false |
-help | Displays help information for commands and their options. | |
Import-Export Tool Commands
The following sections describe each supported command and provide examples on their usage.
Note:
When running the import-export tool, connecting to a Terracotta server is established via the -connect-to option and specifying a URI connection string (e.g. terracotta://<host>:<port>)
Exporting a Dataset to a Parquet File
The export-parquet command is used to export a dataset's records to a parquet file. A sample of the dataset's records (-schema-sample-size) is used to construct the parquet file's schema, which is based on the unique set of cells contained with the sampled records. Useful features include filtering the returned records via the -filter-cell-name/type option, explicitly excluding or including cells in the exported records, and truncating and nullifying String and Byte array type cells respectively based on size and length of those cell values.
Syntax:
export-parquet -connect-to <connectionURI>
-dataset-name <datasetName>
-dataset-type <datasetType>
-output-folder <outputFolder>
[-filter-cell-name <cellName>
-filter-cell-type <cellType>
-filter-low-range <lowValue>
-filter-high-range <highValue>]
[-schema-sample-size <schemaSampleSize>]
[-append-cell-type]
[-max-columns <maxColumns>]
[-no-abort-when-exceed-max-columns]
[-multi-output-files]
[-max-string-length <maxStringLength>]
[-max-byte-length <maxByteLength>]
[-include-cells <cellname>, <celltype> [, ...]]
[-exclude-cells <cellname>, <celltype> [, ...]]
[-exclude-empty-records]
[-log-stream-plan]
Option | Description |
-append-cell-type | When constructing the parquet schema’s field names, always append the cell’s Type to the cell’s Name (default: false - only append when required in order to avoid field name clashes) |
-connect-to | Server URI to connect to |
-dataset-name | Name of dataset to export |
-dataset-type | Type of dataset to export [BOOL | CHAR | INT | LONG | DOUBLE | STRING] |
-exclude-cells | A comma-separated list of cell definitions as <cellname, celltype> to be excluded from the export. Include cells (-include-cells) takes precedence over exclude cells |
-exclude-empty-records | Do not export records that contain zero cells |
-filter-cell-name | Cell name used as a range filter to apply to the queried dataset records |
-filter-cell-type | Cell type of the range filter [INT | LONG | DOUBLE] |
-filter-high-range | High value for the range filter |
-filter-low-range | Low value for the range filter |
-help | Help |
-include-cells | A comma-separated list of cell definitions as <cellname, celltype> to be included in the export. Only these cells will be exported |
-log-stream-plan | Log the details of the stream plan |
-max-byte-length | For byte array values, the maximum byte length in byte count that will be exported, null otherwise (default: all byte arrays are exported regardless of length) |
-max-columns | Maximum allowed number of columns (i.e. unique cell definitions) in the output file (default: 800 columns) |
-max-string-length | For string values, the maximum string length in number of characters that will be exported, truncated otherwise (default: no strings are truncated) |
-multi-output-files | Generate multiple output files when the number of cells exceed the maximum allowed number of columns (default: false - write all cells to a single file) |
-no-abort-when-exceed-max-columns | Do not abort the export when the number of cells exceeds the maximum allowed number of columns |
-output-folder | Output folder where exported file(s) will be written |
-schema-sample-size | Number of dataset records to query upon which the schema will be based (default: 5 records) |
Examples
1. Exporting an entire dataset to a parquet file:
import-export-tool.sh export-parquet -connect-to terracotta://localhost:9410 -dataset-name DS1 -dataset-type LONG
-output-folder /path/to/output_folder
2. Exporting a subset of a dataset to a parquet file using a filter cell:
import-export-tool.sh export-parquet -connect-to terracotta://localhost:9410 -dataset-name DS1 -dataset-type LONG
-output-folder /path/to/output_folder -filter-cell-name MyLongCell -filter-cell-type LONG -filter-low-range 10 -filter-high-range 50
3. Exporting a subset of a dataset to a parquet file using a filter cell and only including specific cells in the output for each record.
import-export-tool.sh export-parquet -connect-to terracotta://localhost:9410 -dataset-name DS1
-dataset-type LONG -output-folder /path/to/output_folder -filter-cell-name MyLongCell -filter-cell-type LONG -filter-low-range 10
-filter-high-range 50 -include-cells MyLongCell,LONG,MyDoubleCell,DOUBLE,MyBooleanCell,BOOL
Note:
For parquet export, when specifying one or more -include-cells and a -filter-cell, the filter cell/type must also appear in the included-cells list.
Exporting a Dataset to a TSON File
The export-tson command is used to export a dataset to a TSON-formatted file. The file can be compressed if necessary.
Syntax:
export-tson -connect-to <connectionURI>
-dataset-name <datasetName>
-dataset-type <datasetType>
-output-file <outputFile>
[-filter-cell-name <cellName>
-filter-cell-type <cellType>
-filter-low-range <lowValue>
-filter-high-range <highValue>]
[-max-string-length <maxStringLength>]
[-max-byte-length <maxByteLength>]
[-include-cells <cellname>, <celltype> [, ...]]
[-exclude-cells <cellname>, <celltype> [, ...]]
[-pretty-print]
[-compress]
[-exclude-empty-records]
[-log-stream-plan]
Option | Description |
-compress | Compress the generated output file after export |
-connect-to | Server URI to connect to |
-dataset-name | Name of dataset to export |
-dataset-type | Type of dataset to export [BOOL | CHAR | INT | LONG | DOUBLE | STRING] |
-exclude-cells | A comma-separated list of cell definitions as <cellname, celltype> to be excluded from the export. Include cells (-include-cells) takes precedence over exclude cells |
-exclude-empty-records | Do not export records that contain zero cells |
-filter-cell-name | Cell name used as a range filter to apply to the queried dataset records |
-filter-cell-type | Cell type of the range filter [INT | LONG | DOUBLE] |
-filter-high-range | High value for the range filter |
-filter-low-range | Low value for the range filter |
-help | Help |
-include-cells | A comma-separated list of cell definitions as <cellname, celltype> to be included in the export. Only these cells will be exported |
-log-stream-plan | Log the details of the stream plan |
-max-byte-length | For byte array values, the maximum byte length in byte cou+nt that will be exported, null otherwise (default: all byte arrays are exported regardless of length) |
-max-string-length | For string values, the maximum string length in number of characters that will be exported, truncated otherwise (default: no strings are truncated) |
-output-file | Full pathname of output file to create and write the exported dataset records to (parent folder must exist) |
-pretty-print | Include line breaks and indentations when writing records to the file |
Examples
1. Exporting an entire dataset to a TSON file:
import-export-tool.sh export-tson -connect-to terracotta://localhost:9410 -dataset-name DS1 -dataset-type LONG
-output-file path\to\file\myfile.tson
2. Exporting an entire dataset to a compressed TSON file:
import-export-tool.sh export-tson -connect-to terracotta://localhost:9410 -dataset-name DS1 -dataset-type LONG
-output-file path\to\file\myfile.tson.gz -compress
3. Exporting an entire dataset to a TSON file truncating all string values greater than 256 characters long and excluding all byte array values greater than 1024 bytes and excluding all data for a specific cell:
import-export-tool.sh export-tson -connect-to terracotta://localhost:9410 -dataset-name DS1 -dataset-type LONG
-output-file path\to\file\myfile.tson -max-string-length 256 -max-byte-length 1024 -exclude-cells MyBooleanCell,BOOL
Importing Data from a TSON File
The import-tson command is used to add records and cells into an existing dataset by importing them from a TSON-formatted file, which can be optionally compressed.
Note:
When importing a TSON file into a dataset, the target dataset must already exist within the cluster. If the specified dataset does not exist, the import operation will fail.
Syntax:
import-tson -connect-to <connectionURI>
-dataset-name <datasetName>
-dataset-type <datasetType>
-input-file <inputFile>
[-compressed]
[-clear-dataset]
[-exclude-empty-records]
Option | Description |
-clear-dataset | Delete all records from the target dataset before performing the import |
-compressed | Specifies that the input file is compressed |
-connect-to | Server URI to connect to |
-dataset-name | Name of dataset to import |
-dataset-type | Type of dataset to import [BOOL | CHAR | INT | LONG | DOUBLE | STRING] |
-exclude-empty-records | Do not import records that contain zero cells |
-help | Help |
-input-file | Full pathname of TSON-formatted input file whose records will be added to the specified dataset |
Examples
1. Import a TSON file into a dataset, first clearing all contents from the target dataset:
import-export-tool.sh import-tson -connect-to terracotta://localhost:9410 -dataset-name DS2 -dataset-type LONG
-input-file path\to\input_file.tson -clear-dataset
2. Import a compressed TSON file into a dataset without clearing the target dataset’s contents and excluding any records in the import file which have zero cells (i.e. empty records):
import-export-tool.sh import-tson -connect-to terracotta://localhost:9410 -dataset-name DS2 -dataset-type LONG
-input-file path\to\input_file.tson.gz -compressed -exclude-empty-records
output results:
Import Result: Success
1,000 records processed.
5 empty records (with no cells) were omitted
17 records failed to be added to the Dataset.
Note:
When importing records without first clearing the target dataset, any records in the import file that have existing keys in the target dataset will not be imported. The count of these skipped records will appear in the output results of the import operation as 'records failed to be added to the Dataset'.