This document describes the utility "ADACMP".
The following topics are covered:
The compression utility ADACMP compresses user raw data into a form which can be used by the mass update utility ADAMUP.
The input data for this utility must be contained in a sequential file. LOB field values can also be provided in separate files.
The logical structure and characteristics of the input data are described by a field definition table (FDT). These statements specify the level number, field name, standard length and format together with any definition options that are to be assigned to the field (descriptor, unique descriptor, multiple-value field, null value suppression, fixed storage, periodic group). See Administration, FDT Record Structure for more detailed information about the layout of the file in the database and characteristics of the input data.
Each field in the input record without the option SY (system generated) is compressed. Compression consists of removing trailing blanks from alphanumeric fields and leading zeros from numeric fields. Unpacked and packed fields are checked for correct data. Fields defined with the fixed storage option are not compressed. A user exit is provided to allow additional editing of each input record with a user-written routine.
System generated fields are either regenerated or decompressed, depending on the keyword specified for the ADACMP parameter SYFINPUT.
This utility creates three types of output files:
Compressed data.
Descriptor values.
Records with errors.
The sizes of the descriptor values of all descriptors are listed at the end of execution.
If the utility writes records to the error file, it will exit with a non-zero status.
Note:
Please be careful if you want to add data to a file that still
contains ICU 3.2 collation descriptors:
If you specify the FDT with the parameters DBID and FILE, the FDT is taken unchanged from the database. This means the ICU version is still 3.2. You can add the data to the file, and the ICU version remains 3.2.
If you take the FDT from the CMPFDT file, a new FDT is created from the CMPFDT file, where the ICU version is set to 5.4. This means you can only add the data to the file if it is empty, and if you specify the NEW_FDT option. The ICU version used is 5.4.
This utility is a single-function utility.
The sequential files CMPDTA, CMPDVT and CMPERR can have multiple extents. For detailed information about sequential files with multiple extents, see Adabas Basics, Using Utilities, Adabas Sequential Files, Multiple Extents . CMPLOB is a directory that contains files which may be stored as LOB values in the database.
Data Set | Environment Variable/ Logical Name |
Storage Medium |
Additional Information |
---|---|---|---|
Associator | ASSOx | Disk | |
Compressed data | CMPDTA | Disk, Tape (* see note) | output by ADACMP |
Descriptor Value Table | CMPDVT | Disk, Tape (* see note) | output by ADACMP |
Rejected data | CMPERR | Disk, Tape (* see note) | output by ADACMP |
Input data FDT | CMPFDT | Disk, Tape (* see note) | Utilities Manual |
User input data | CMPIN | Disk (* see note) | Utilities Manual |
User LOB input data | CMPLOB | Disk | Utilities Manual |
ADACMP control statements |
stdin/ SYS$INPUT |
Utilities Manual | |
ADACMP messages | stdout/ SYS$OUTPUT |
Messages and Codes |
Note:
(*) A named pipe can be used for this sequential
file (see Adabas Basics, Using
Utilities, Adabas Sequential Files, Using Named Pipes for
details).
If the SINGLE_FILE option is set, the Descriptor Value Table (DVT) and the compressed user data are written together to the logical name CMPDTA.
The utility writes no checkpoints.
The following control parameters are available:
DBID = number D [NO]DST FDT FIELDS {uncompressed_field_definition | FDT}...[END_OF_FIELDS | . ] FILE = number D [NO]LOBS D [NO]LOWER_CASE_FIELD_NAME D MAX_DECOMPRESSED_SIZE = number [K|M] D MUPE_C_L = {1|2|4} D [NO]NULL_VALUE D NUMREC = number D RECORD_STRUCTURE = keyword SEPARATOR = character | \character D [NO]SHORT_RECORDS D [NO]SINGLE_FILE SKIPREC = number D SOURCE_ARCHITECTURE = (keyword[,keyword][,keyword]) D SYFINPUT = keyword D TZ {=|:} [timezone] D [NO]USEREXIT D [NO]USERISN D WCHARSET = char_set
DBID = number
This parameter selects the database that contains the file to be specified by the FILE parameter.
[NO]DST
The parameter DST is required if a daylight saving time indicator is provided for date/time fields with the option TZ. The daylight saving time indicator must be appended behind the date/time value as a 2-byte integer value (format F) that contains the number of seconds to be added to the standard time in order to get the actual time (usually 0 or 3600).
Without the parameter DST, it is not possible to define time values in the hour before the time is switched back to standard time.
The default is NODST.
Notes:
A DT field has the following definition: 1,DT,8,P,DT=E(DATE_TIME),TZ
The following values must then be specified for this field:
The local date/time value corresponding to the edit mask DATE_TIME as an 8-byte packed value
The daylight saving time indicator, usually 0 for standard time and 3600 for summer time as a 2-byte fixed point value
Case 1 (DT has a date/time value with daylight saving time): 0x0200910250230000E10 Case 2 (DT has a date/time value with standard time): 0x0200910250230000000
FDT
If this parameter is specified as the first parameter, or as the second parameter after [NO]LOWER_CASE_FIELD_NAMES, ADACMP reads the FDT information contained in the sequential file CMPFDT and displays the FDT.
Note:
Alternatively, instead of FDT, you can specify DBID and FILE as
the first parameters, or as the second parameters after
[NO]LOWER_CASE_FIELD_NAMES (which is allowed before DBID and FILE). In this
case, the FDT of the file is used as the base for the compression.
The FDT parameter can be specified several times, but if you have already determined the FDT to be used for the compression by specifying the FDT or DBID and FILE parameters, specifying the FDT parameter again will only display the FDT; the FDT is not overwritten by the CMPFDT file.
FIELDS {uncompressed_field_definition | FDT}...[END_OF_FIELDS | . ]
This parameter is used to specify a subset of fields given in the FDT and their format and length. This means that the input records do not have to contain all of the fields given in the FDT, or that fields can be provided with a different format or length. The syntax and semantics are the same as for the format buffer, with the exception that you can also specify an R-element (for LOB references) if the decompressed record contains the name of a file containing the LOB value instead of the LOB value itself. See Administration, Loading and Unloading Data, Uncompressed Data Format for more detailed information.
While entering the specification list, the FDT function can be used to display the FDT of the file to be decompressed. The specification list can be terminated or interrupted by entering END_OF_FIELDS or `.'. The `.' option is an implicit END_OF_FIELDS and is compatible with the format buffer syntax. FIELDS or END_OF_FIELDS must always be entered on a line by itself, whereas the `.' may be entered on a line by itself or at the end of the format buffer elements.
If the field definitions are terminated with the END_OF_FIELDS parameter, this parameter must be specified in upper case when the LOWER_CASE_FIELD_NAMES parameter is used. In addition, the FDT parameter must also be specified in upper case when the LOWER_CASE_FIELD_NAMES parameter is used.
FILE = number
This parameter specifies the file from which the FDT information is to be read. This parameter can only be specified after the DBID parameter.
[NO]LOWER_CASE_FIELD_NAMES
If LOWER_CASE_FIELD_NAMES is specified, Adabas field names are not converted to upper case. If NOLOWER_CASE_FIELD_NAMES is specified, Adabas field names are converted to upper case. The default is NOLOWER_CASE_FIELD_NAMES.
If lower case field names in the FDT are not to be converted to upper case, the parameter must be specified as the first parameter before the FDT parameter; if lower case field names in the FIELDS parameter are not to be converted to upper case, the parameter must be specified before the FIELDS parameter.
Warning: If the LOWER_CASE_FIELD_NAMES parameter is specified for the CMPFDT file, not upper case conversion is done for the complete file. Lower case characters for field formats and field options will cause FDT syntax errors. This problem also exists for lower case characters in the FIELDS parameter. |
[NO]LOBS
This parameter specifies whether LA and LB field values are to be stored in a LOB file after loading the compressed data into the database:
If the parameters DBID and file number have been specified, this parameter is ignored, and the field is handled as described below;
If the parameters DBID and file number have not been specified and LOBS is specified, field values for LA and LB fields are prepared for storage in a LOB file, except the field is defined as a descriptor.
If the parameters DBID and file number have not been specified and NOLOBS is specified, field values for LA and LB fields are prepared for storage in the base file. In this case, the length of field values for LA and LB fields must not exceed 16381 bytes and the compressed record must fit into a 32 KB DATA block.
Please note that LA and LB fields which are descriptors or parent fields of a derived descriptor, e.g. a super descriptor, are always handled as described for the NOLOBS parameter.
Default behaviour is as follows:
If the parameters DBID and file number have been specified and the file is a base file with corresponding LOB file, LOBS is default.
If the parameters DBID and file number have been specified and the file is not a base file with corresponding LOB file, NOLOBS is default.
If the parameters DBID and file number have not been specified, LOBS is default.
MAX_DECOMPRESSED_SIZE = number [K|M]
This parameter specifies the maximum size of a decompressed record in bytes, kilobytes or megabytes, depending on the specification of "K" or "M" after the number. This parameter is intended to recognize invalid CMPIN files as early as possible.
The default is 65536. This is also the minimum value.
Notes:
MUPE_C_L = {1|2|4}
If the uncompressed data contain multiple-value fields or periodic groups, they are preceded by a binary count field with the length of MUPE_C_L bytes.
The default is 1.
[NO]NULL_VALUE
The parameter NULL_VALUE is required if you are compressing data according to the standard FDT and the status values of the NC option fields are given in the input data. Normally, such input data is generated by ADADCU with the NULL_VALUE option set.
The default is NONULL_VALUE.
The definition in the FDT for the field AA is: 1, AA, 2, A, NC
Case 1 (AA has a non-NULL value): input record (hexadecimal) = 00004142
Case 2 (AA has a NULL value): input record (hexadecimal) = FFFF2020
NUMREC = number
This parameter specifies the number of input records to be processed. If this parameter is omitted, all input records contained on the input file are processed.
Use of this parameter is recommended for the initial execution of ADACMP if the input data file contains a large number of records. This avoids unnecessary processing of all records in cases where a data definition error or invalid input data results in a large number of rejected records.
This parameter is also useful for creating small files for test purposes.
RECORD_STRUCTURE = keyword
This parameter specifies the type of record separation used in the input file with the environment variable CMPIN. The following keywords can be used:
Keyword | Meaning |
---|---|
ELENGTH_PREFIX | The records in the CMPIN file are separated by a two-byte exclusive length field. |
E4LENGTH_PREFIX | The records in the decompressed data file are separated by a 4-byte exclusive length field. |
ILENGTH_PREFIX | The records in the CMPIN file are separated by a two-byte inclusive length field. |
I4LENGTH_PREFIX | The records in the decompressed data file are separated by a 4-byte inclusive length field. |
NEWLINE_SEPARATOR | The records in the CMPIN file are separated by a new-line character. This keyword may only be specified if the field values do not contain characters interpreted as new-line (i.e. if there are only unpacked, alphanumeric and Unicode fields, and the alphanumeric and Unicode fields contain only printable characters). This keyword and the USERISN parameter are mutually exclusive. |
RDW | The records in the CMPIN file contain data that has
been transferred from an IBM host using the FTP site rdw option.
ADACMP is able to process such data without having to use cvt_fmt
first (in previous versions, the unsupported tool cvt_fmt was used for
such format conversions). For example:
% ftp IBM-host ftp> binary 200 Representation type is Image ftp> site rdw 200 Site command was accepted ftp> get decomp % setenv CMPIN decomp % adacmp fdt record_structure=rdw source=(ebcdic,high) |
RDW_HEADER | Like RDW, for data decompressed on a mainframe with HEADER=YES. |
HEADER | For data decompressed on a mainframe with HEADER=YES, if the decompressed data do not contain any additional information about block or record length. |
VARIABLE_BLOCKED | The variable blocked format from BS2000 or IBM. |
The default is ELENGTH_PREFIX.
SEPARATOR = character | \character
If you specify this option, ADACMP expects the fields in the raw data record to be separated by the character specified. You can omit the apostrophes round the character specification if the character has no special meaning for the Adabas utilities. The same fields in different records are then permitted to be of different lengths.
If a format buffer is specified using the FIELDS parameter, the order of the specified field names must correspond with the order in which the fields are specified in the FDT. A mismatch results if this is not the case.
If the FDT contains multiple value fields or periodic groups, a format buffer must be specified with the FIELDS parameter. Members of periodic groups must be ordered by 1) periodic group index and 2) field sequence in the FDT (see example 2 below).
Because no binary data is expected in the input file using the SEPARATOR option, the RECORD_STRUCTURE parameter will be set to NEWLINE_SEPARATOR.
FDT: 1, AA, 2, U 1, AB, 8, U 1, AC, 2, A CMPIN: 12;12345678;AA 1234;5;BB adacmp fdt separator=\; or for UNIX adacmp fdt separator=\\\; or adacmp fdt separator='\;'
In this example, 2 records are compressed with the default FDT, the separator character is the semicolon, and the default record structure is NEWLINE_SEPARATOR. Note that the semicolon must be preceded by a backslash, otherwise it would be treated as the start of a comment. If you enter the parameters under UNIX directly from the command line, it is necessary to precede the backslash and the semicolon by additional backslashes or to put them in quotes or double quotes since they are special characters.
FDT: 1, XX, PE 2, AA, 8, A 2, AB, 8, U 1, YY, 2, A Correct: CMPIN: aaaa,1,bbbb,2,yy Command: adacmp fdt separator=, fields AA1,AB1,AA2,AB2,YY. First, the field values for the periodic group index 1 are specified, and then the field values for periodic group index 2. Invalid: CMPIN: aaaa,bbbb,1,2,yy Command: adacmp fdt separator=, fields AA1-2,AB1-2,YY. The fields specification is invalid because the 2nd value of AA is specified before the 1st value of AB; you will get the error SEPINV.
In this example, 1 record with fields given in the format buffer is compressed, the separator character is the comma.
FDT: 1, AA, 8, A 1, MA, 1, A, MU CMPIN: aaaa%2%A%B bbbb%3%C%D%E adacmp dbid=9 file=15 separator=%, fields "AA,MAC,1,U,MA1-N"
In this example, 2 records with fields given in the format buffer are compressed, the occurrence count or the multiple value field MA is different in different records. The separator character is the percent character.
[NO]SHORT_RECORDS
If SHORT_RECORDS is specified, it is possible to omit fields at the end of the decompressed record that contain null values.
The default is NOSHORT_RECORDS.
You can only omit complete fields; it is not possible to truncate the last value:
Assuming you have specified the parameters for a file containing alphanumeric fields AA and AB:
FIELDS AA,20,AB,20 END_OF_FIELDS SHORT_RECORDS
Then the following record is allowed:
"Field AA "
The following record is not allowed:
"Field AA"
[NO]SINGLE_FILE
If the SINGLE_FILE option is set, ADACMP writes the Descriptor Value Table (DVT) and the compressed user data to a single file (CMPDTA) instead of writing them to separate files.
The default is NOSINGLE_FILE.
SKIPREC = number
This parameter specifies the number of records to be skipped before compression is started.
SOURCE_ARCHITECTURE = ( keyword [,keyword [,keyword] ] )
This parameter specifies the format (character set, floating-point format and byte order) of the input data records. The following keywords can be used:
Keyword Group | Valid Keywords |
---|---|
Character set |
ASCII EBCDIC |
Floating-point format |
IBM_370_FLOATING IEEE_FLOATING VAX_FLOATING |
Byte order |
HIGH_ORDER_BYTE_FIRST LOW_ORDER_BYTE_FIRST |
If no keyword of a keyword group is specified, the default for this keyword group is the keyword that corresponds to the architecture of the machine on which ADACMP is running.
Note:
The FDT is always input in ASCII format.
If the input records that are to be compressed are in IBM format, the user must specify the following:
SOURCE_ARCHITECTURE = (EBCDIC, IBM_370_FLOATING, HIGH_ORDER_BYTE_FIRST)
SYFINPUT = keyword
This parameter specifies the input used for the compression of system generated fields. The following keywords can be used:
Keyword | Meaning |
---|---|
SYSTEM | The system generated field values are regenerated by the system in ADACMP. |
USER | The system generated field values are taken from the decompressed file. |
The default is SYFINPUT = USER.
TZ {=|:} [timezone]
The specified time zone must be a valid time zone name that is contained in the time zone database known as the Olson database (https://www.iana.org/time-zones). If a time zone has been specified, this time zone is used for time zone conversions of date/time fields with the option TZ.
The default is UTC, which is used internally to store date/time fields with option TZ; no conversion is required.
If you specify an empty value, no checks are made to ensure that date/time fields are correct.
Note:
The time zone names are file names. Depending on the platform,
these file names may or may not be case sensitive. Also, the time zone names,
depending on the platform, may or may not be case sensitive.
tz:Europe/Berlin
This is correct on all platforms.
TZ=Europe/Berlin
With this specification, TZ is converted to upper case EUROPE/BERLIN. This is correct on Windows, because file names are not case sensitive on Windows, but it is not correct on Unix, because Unix file names are case sensitive.
[NO]USEREXIT
This option specifies whether a user exit is to be taken or not. If USEREXIT is specified, the environment variable ADAUEX_6/logical name ADABAS$USEREXIT_6 must point to a loadable user-written routine.
See Administration, User Exits and Hyperexits for more details.
The default is NOUSEREXIT.
[NO]USERISN
If this option is set to USERISN, the ISN for each record in the input file will be assigned by the user.
If USERISN is specified, the user must give the ISN to be assigned to each record as a four-byte binary number immediately preceding each data record.
ISNs may be assigned in any order and must be unique (for the file). The ISN must not exceed the maximum number of records (MAXISN) specified for the file (see the file definition utility ADAFDU for more detailed information).
ADACMP does not check for unique ISNs or for ISNs which exceed MAXISN. These checks are performed by the mass update utility ADAMUP (if an error is detected, the ADAMUP run terminates with an error message).
If this option is set to NOUSERISN, the ISN is assigned by Adabas.
The default is NOUSERISN.
WCHARSET = char_set
This parameter specifies the default encoding used in the decompressed file based on the encoding names listed at http://www.iana.org/assignments/character-sets - most of the character sets listed there are supported by ICU, which is used by Adabas for internationalization support.
The default is UTF-8.
The ADACMP utility outputs three files:
Compressed data
Descriptor values
Records with errors
The data records which ADACMP has processed, modified and compressed are output together with the FDT information to a sequential file. This file is used as input for the mass update utility ADAMUP.
If the output file contains no records (no records on the input file or all records rejected), the output may still be used as input for the mass update utility ADAMUP.
This file contains the descriptor value tables (DVT).
Compressed data records and descriptor value tables are written to one file if the SINGLE_FILE option is specified.
Any records rejected by ADACMP are written to the ADACMP error file. The contents of this error file should be displayed using the ADAERR utility. Do not print the error file using the standard operating system print utilities since the records contain unprintable characters.
See the ADAERR utility for further information.
The ADACMP report begins with a display of the field definition entered if CMPFDT is used for input. Any statement which contains a syntax error will be printed with a message immediately following the statement.
Following the display of the data-definition statements, a descriptor summary, the number of input records processed, the number of input records rejected, and the number of input records compressed are printed.
ADACMP does not have a restart capability. An interrupted ADACMP run must be re-started from the beginning.
ADACMP does not change the database; therefore, no considerations need to be made concerning database status before restarting ADACMP.