Loading And Unloading Data

This document covers the following topics:


Introduction

There are several ways of loading data into a database:

  • Using the utility ADAMUP: you can load compressed data that was generated by ADACMP, ADAMUP or ADAULD.

  • Using the utility ADABCK: you can restore a file or database from a backup that was created by ADABCK.

  • Using the IMPORT function of the utility ADAORD: you can import a file that was created by the EXPORT function of ADAORD.

Compressed Data

Data that is loaded into the database by the utility ADAMUP must be input to ADAMUP in a special, compressed data format. Compressed data can be generated by the utility ADACMP, which converts uncompressed data, as described in the section Uncompressed Data Format, to the necessary format. Compressed data can also be generated by the utilities ADAMUP and ADAULD, when existing data is unloaded from the database.

This is a very flexible data format. You can load copies of a compressed data file into several different databases if required, or as several copies of the same file into a single database. You can also load just a subset of the records into a file.

A disadvantage of this file format is that when the data is loaded into the database, ADABAS has to build a sort index for each file. For large files, this can require large amounts of CPU time, and SORT and TEMP container files are required.

Backup Data

Backup data is generated by the utility ADABCK. Such data can be used to build a long-term data archive, and can also be used for restoring files to databases, or to restore whole databases.

The backup and restore operations are faster than the other methods of saving and restoring data. However, you must always copy backup files back to the database from which they originated. Also, you cannot copy the files back with different file numbers.

Import/Export Data

Data that was exported from a database using the ADAORD EXPORT function can be imported to a database using the ADAORD IMPORT function. The exported data is very similar to the compressed data format described above, but the main difference is that the index information of the exported files is also exported. This means that when data is subsequently imported, the index does not have to be rebuilt, so the load procedure is much faster than the corresponding operation for ADAMUP. Also, SORT and TEMP container files are not required.

Like compressed data, this is a flexible data format. You can load copies of a compressed data file into several different databases if required, or as several copies of the same file into a single database. However, because the index information is stored in the export file, you cannot import just a subset of the records into a file in the database.

Copying Data to other Hardware Architecture

The situation may occur in which you want to copy data from one Adabas database to another database on a computer with a different hardware architecture, for example from a Linux platform to a Windows platform.

You can use the utility ADABCK (Version 6.4 and higher) for this purpose - you can restore a backup created on one hardware architecture into a database on a computer with another architecture.

Notes:

  1. This is not possible with backups created with Adabas versions < 6.3.
  2. Copying data in this way from or to mainframes is not supported.
  3. Only ADABCK can process input files created on a different hardware architecture; the utilities ADAMUP and ADAORD are not able to do so.
  4. Alternatively, you can use the old way of copying data to another hardware architecture that was required with previous Adabas versions: unload the file with the utility ADAULD, decompress the file with the utility ADADCU, compress the file with the utility ADACMP for the new architecture, load the file with ADAMUP. This may be useful if you also want to change the block sizes or the FDT of a file.

Uncompressed Data Format

This section describes the format of data records that are input to the utility ADACMP and output from the utility ADADCU. This format is called uncompressed data format (also called raw data format). The utility ADACMP reads in data in this format and compresses it for subsequent input to the mass update utility ADAMUP. The utility ADADCU performs the opposite operation: it takes compressed data that was generated by either ADACMP and decompresses it. Note that compressed data can also be generated by the utilities ADAMUP and ADAULD when data is deleted or unloaded from a database.

Unless otherwise indicated, the data formats described apply to both the input data for ADACMP and the output data for ADADCU.

Syntax of Uncompressed Data Records

Uncompressed data records are a sequence of the following syntax elements:

format_buffer_element 
field_references

The syntax elements, except field_references, are the same as the format buffer elements described in Command Reference, Calling Adabas, Format and Record Buffers. Note that here a format buffer element is nX, a literal, or a field definition including the length and format specifications, if they exist. The difference is how the syntax elements are separated:

  • The complete syntax element must be entered in one line.

  • The syntax elements are separated by a comma or a newline.

  • You can insert comments between the syntax elements: a semicolon indicates that the following characters, until end of line, are comments.

  • You can insert FDT between the syntax elements; FDT must be entered in a new line. This indicates to the utility where the uncompressed data record syntax is specified that the FDT is to be displayed.

The following special considerations apply for format buffer elements specified in a decompressed record-structure specification:

  • Edit masks are not allowed.

  • N elements are not allowed.

  • 1-N elements must be preceded directly in the same line by the corresponding C element. Unlike the format buffer for an update or store command, they are also allowed in format buffers for compression.

    Example

    Assume GB is a periodic group with the fields BA, BB and BC.

    GBC,GB1-N The number of occurrences of the periodic group and all occurrences of the periodic group are processed (GBC, BA1, BB1, BC1, BA2, BB2, BC2, ...).
  • 1-N elements are not allowed for fields within a PE.

Examples for invalid specifications

GB2-GB4 Incorrect syntax.
GB4-2 Descending range.
GBC
GB1-N
GBC and GB1-N must be specified in the same line.
GBC,BA1-N name 1-N must not be specified for fields within a periodic group.

Syntax of field_references

name R [mu_pe_index] [, length ]

name must be the name of a LOB field. If the decompressed record specification contains field references, the decompressed record doesn’t contain the LOB values themselves, but file names, where the LOB values are contained in the files.

length may be a number >= 0 or ‘*’. length=* is allowed only at the end of the record for a single LOB value. If length=0 is specified, a 2-byte inclusive length is put in front of the file name (analogous to LA fields). The default is 0.

Syntax of mu_pe_index

{ i [-j] | N} [ (m [-n] | N | 1-N) ]| 1 – N

If MU fields or fields in periodic groups are LOB fields, you can specify the MU or PE indices for field references the same way as you do for field values.

Notes:

  1. The rules given above for the usage of 1-N as an MU-PE index for field values alsoapply for field references.
  2. If the value of a LOB field is blank, the field reference is blank, too.

Example for a decompressed record structure specification

AA,AB           ; 2 fields specified in the same line	
9X,’LITERAL’    ; Compress: 16 bytes are ignored	
                ; Decompress: "         LITERAL" in decompressed record	
fdt             ; Display FDT before next field is specified	
AT1-12C         ; Number of	values of MU field in the first 12 elements of PE
                ; Only allowed for decompress	
AT1-12(1-2),8,U ; Values of MU field in a PE	
                ; Length is 8 bytes, Format is U	
P1C,P11-N       ; Periodic group count and all groups	
                ; Allowed for compress, too.	
LMR1-4,20       ; File names of files containing values

Record Definition Examples

This section provides record definition examples. All the examples in this section refer to the sample ADABAS files in Appendix A of the Command Reference Manual.

Example 1: Defining elementary fields (standard length and format):

Syntax        : AA,5X,AB.

Record        : AA value(8 bytes alphanumeric)
                5 spaces
                AB value(2 bytes packed)

Example 2: Defining elementary fields (length and format override):

Syntax        :  AA,5X,AB,3,U.

Record        :  AA value (8 bytes alphanumeric)
                 5 spaces
                 AB value (3 bytes unpacked)

Example 3: Processing a periodic group:

Syntax        : GB1.
    
Record        : BA1 value (1 byte binary)
                BB1 value (5 bytes packed)
                BC1 value (10 bytes alphanumeric)

Example 4: Processing the first two occurrences of periodic group GB:

Syntax        : GB1-2.
Record        : BA1 value (1 byte binary)
                BB1 value (5 bytes packed)        GB1
                BC1 value (10 bytes alphanumeric)
                BA2 value (1 byte binary)
                BB2 value (5 bytes packed)        GB2
                BC2 value (10 bytes alphanumeric)

Example 5: Processing the sixth value of the multiple-value field MF:

Syntax        : MF6.

Record        : MF value 6 (3 bytes alphanumeric)

Example 6: Processing the first two values of the multiple-value field MF:

Syntax        : MF01-02.

Record        : MF value 1 (3 bytes alphanumeric)
                MF value 2 (3 bytes alphanumeric)

Example 7: The highest occurrence number of the periodic group GC and the existing
number of values for the multiple value field MF are processed:

Syntax        : GCC,MFC.
    
Record        : Highest occurrence count for GC (1 byte binary)
                Value count for MF (1 byte binary)

Output Record

The utility ADADCU returns the requested field values in the order specified by the record definition syntax. A value is returned in the standard length and format defined for the field, unless a length and/or format override was specified. If the value is a null value, it is returned in the format in effect for the field:

Format Null Value
ALPHANUMERIC (A) Blanks (ASCII: hex `20' or EBCDIC: hex `40')
BINARY (B) Binary zeros
FIXED-POINT (F) Binary zeros
FLOATING POINT (G) Binary zeros
PACKED DECIMAL (P) Packed decimal 0
UNPACKED DECIMAL (U) Unpacked decimal 0, depending on the target architecture
UNICODE (W) Blanks depending on WCHARSET specified

Note:
For packed decimals, C is used as sign. For unpacked decimals, 3 is used as sign for target architecture ASCII, F for target architecture EBCDIC.

Adabas returns the number of bytes equal to the combined lengths (standard or overridden) of all requested fields.

Input Data Requirements for ADACMP

User data which is input to ADACMP must be contained in a sequential file. There are four ways in which the records in the input file can be separated; please refer to the parameter RECORD_STRUCTURE in the chapter ADACMP in the Utilities Manual for more detailed information. The fields in each record must be structured according to the data definition statements provided.

If a user exit routine is used, the structure must agree with the data definitions after user exit processing. Any trailing information in an input record for which there is no corresponding data-definition statement will not be processed and will not be contained in the output produced by ADACMP.

Fields defined as UNPACKED must contain a valid sign value in the four high-order bits of the low-order byte. The sign must be in zoned-numeric format. ADABAS represents the signs in zoned format.

Fields defined as PACKED must contain a valid sign value in the four low-order bits of the low-order byte. Valid positive signs are A, C, E and F. Valid negative signs are B and D. ADABAS represents a positive value with a C and a negative value with a D.

If the input file does not contain any records, a warning message is displayed and the utility aborts. However, a CMPDTA output file that contains the FDT information is created.

Multiple-Value Field Count

If the structure of the decompressed record is not described via the FIELDS parameter, please consider the following:

The values for a multiple value field must be preceded by a 1, 2 or 4 byte binary count, depending on the setting of the ADACMP parameter MUPE_C_L, to indicate the number of values of the multiple-value field in the record. The minimum number of values which may be specified is 1.

If the number of values is constant for each record, this number may be specified in the field definition table used to define the multiple-value field. In this case, the count byte in the input record must be omitted. This option is only enabled if the FDT keyword is used. FDTs that are read from the database always default to variable occurrence counts. These variable occurrence counts can be overwritten by using the FIELDS keyword.

Multiple fields within periodic groups must not be specified with an occurrence count when the periodic group has been specified with a variable occurrence count.

Example:

        01,PG,PE
         02,P1,4,A,NU
         02,PM,4,A,NU,MU(4)
                         ^
%ADACMP-E-FIXOCC, specification of occurrences not allowed at this position

The count provided by the user may be modified by ADACMP if the NU option is defined for the field. Null values are suppressed and the count field is modified accordingly.

Example :

Field Definition:  01,MF,4,A,MU,NU

Each record contains a variable number of values for MF.

Input Records       Before ADACMP          After ADACMP
Input Record 1
(3 values)          MF count = 3           MF count = 3
                    AAAA                   AAAA
                    BBBB                   BBBB
                    CCCC                   CCCC
Input Record 3
(3 values)          MF count = 3           MF count = 2
                    AAAA                   AAAA
                    <null value>           CCCC
                    CCCC
Input Record 4
(1 value)           MF count = 1           MF count = 0
                    <null value>

Example :

Field Definition:  01,MF,4,A,MU(3),NU

Each record contains 3 values for MF.

Input Records       Before ADACMP          After ADACMP
Input Record 1                             MF count = 3
                    AAAA                   AAAA
                    BBBB                   BBBB
                    CCCC                   CCCC
Input Record 2                             MF count = 2
                    AAAA                   AAAA
                    BBBB                   BBBB
                    <null value>
Input Record 3                             MF count = 2
                    AAAA                   AAAA
                    <null value>           CCCC
                    CCCC
Input Record 4                             MF count = 0
                    <null value>
                    <null value>
                    <null value>

Periodic Group Count

If the structure of the decompressed record is not described via the FIELDS parameter, please consider the following:

The first occurrence of a periodic group must be preceded by a 1, 2 or 4 byte binary count, depending of the ADACMP parameter MUPE_C_L, which indicates the number of occurrences of the periodic group in the record. The minimum number of occurrences which may be specified is 1.

If the number of occurrences is constant for each record, this number may be specified in the field definition table used to define the periodic group. In this case, the count byte in the input record must be omitted.

This option is only enabled when the FDT keyword is used. FDTs that are read from the database always default to variable occurrence counts. These variable occurrence counts can be overwritten by using the FIELD keyword.

The occurrence count provided may be modified by ADACMP only if all the fields in the periodic group are defined with the NU option. If all the fields in a given occurrence contain null values and there are no following occurrences which contain non-null values, the occurrence will be suppressed and the periodic group occurrence count will be modified accordingly.

Example (PE with NU):

Field Definitions:   01,GA,PE
                          02,A1,4,A,NU
                          02,A2,4,A,NU

The input records contain a variable number of occurrences for GA.

Input Records       Before ADACMP          After ADACMP
Input Record 1      GA count = 2           GA count = 2
                    GA (1st occ.)
                     A1 = AAAA              A1 = AAAA
                     A2 = BBBB              A2 = BBBB
                    GA (2nd occ.)
                     A1 = CCCC              A1 = CCCC
                     A2 = DDDD              A2 = DDDD
Input Record 2      GA count = 1           GA count = 0
                    GA (1st occ.)
                     A1 = <null value>     suppressed *
                     A2 = <null value>     suppressed *
Input Record 3      GA count = 3           GA count = 3
                    GA (1st occ.)
                     A1 = AAAA              A1 = AAAA
                     A2 = <null value>      A2 = suppressed
                    GA (2nd occ.)
                     A1 = BBBB              A1 = BBBB
                     A2 = <null value>      A2 = suppressed
                    GA (3rd occ.)
                     A1 = CCCC              A1 = CCCC
                     A2 = <null value>      A2 = suppressed

* but this is indicated by an empty field count of 2. Up to 63 consecutive empty fields are indicated by one appropriate empty field count.

Example (PE with NU):

Field Definitions:   01,GA,PE(3)
                           02,A1,4,A,NU
                           02,A2,4,A,NU

All input records contain 3 occurrences for GA.

Input Records       Before ADACMP          After ADACMP
Input Record 1
                    GA (1st occ.)          GA count = 3
                     A1 = AAAA              A1 = AAAA
                     A2 = <null value>      A2 suppressed
                    GA (2nd occ.)
                     A1 = BBBB              A1 = BBBB
                     A2 = <null value>      A2 suppressed
                    GA (3rd occ.)
                     A1 = CCCC              A1 = CCCC
                     A2 = <null value>      A2 suppressed
Input Record 2                             GA count = 2*
                    GA (1st occ.)
                     A1 = <null value>      A1 = suppressed
                     A2 = <null value>      A2 = suppressed
                    GA (2nd occ.)
                     A1 = BBBB              A1 = BBBB
                     A2 = <null value>      A2 = suppressed
                    GA (3rd occ.)
                     A1 = <null value>      A1 = suppressed
                     A2 = <null value>      A2 = suppressed
Input Record 3      All occ.               GA count = 0
                    contain                All occurrences
                    null value             are suppressed **

* The first occurrence is included in the count since occurrences follow which contain non-null values. The third occurrence is not included in the count since no occurrences follow which contain non-null values.

** but this is indicated by an empty field count of 2.

Example (PE without NU):

Field Definitions:   01,GA,PE(3)
                           02,A1,4,A
                           02,A2,4,A

All input records contain 3 occurrences for GA.

Input Records       Before ADACMP         After ADACMP
Input Record 1      GA (1st occ.)         GA count = 3
                     A1 = <null value>     A1 = <null value>
                     A2 = <null value>     A2 = <null value>
                    GA (2nd occ.)
                     A1 = <null value>     A1 = <null value>
                     A2 = <null value>     A2 = <null value>
                    GA (3rd occ.)
                     A1 = CCCC             A1 = CCCC
                     A2 = <null value>     A2 = <null value>
Input Record 2                            GA count = 3
                    GA (1st occ.)
                     A1 = <null value>     A1 = <null value>
                     A2 = AAAA             A2 = AAAA
                    GA (2nd occ.)
                     A1 = <null value>     A1 = <null value>
                     A2 = <null value>     A2 = <null value>
                    GA (3rd occ.)
                     A1 = <null value>     A1 = <null value>
                     A2 = <null value>     A2 = <null value>

Variable-Length Indicator

Each value of a variable-length field (length set to zero in the field definition) must be preceded by a length indicator (in binary format) which indicates the value length (including the length indicator).

The length of the length indicator is:

  • 4 bytes, if the field has the L4 option

  • 2 bytes, if the field has the LA option

  • 1 byte, if the field has neither of these options

Example:

Field Definitions:
    
     01,AA,8,A,DE
     01,V1,0,A
     01,V2,0,A,LA
     01,V4,0,A,L4

Input records (high-order first)

"FIELD AA\x09FIELD V1\x00\x0aFIELD V2\x00\x00\x00\x0cFIELD V4"

"FIELD AA\x09FIELD V1\x07\xD2 (2000 data bytes)\x00\x00\x07\xD2 (2000 data bytes)"

Input records (low-order first)

"FIELD AA\x09FIELD V1\x0a\x00FIELD V2\x0c\x00\x00\x00FIELD V4"

"FIELD AA\x09FIELD V1\xD2\x07 (2000 data bytes)\xD2\x07\x00\x00 (2000 data bytes)"

NC Option Indicator

The values for fields with the NC option are defined without the indicator when the FDT is used to describe the input record

Example:

Field Definitions:
     
     01,AA,5,A,NC
     01,AB,5,A,NC

Input Record

Field AA     Field AB

(5 bytes)    (5 bytes)

If the input record contains values for the NC option, then either the NULL_VALUE parameter must be set, or the structure of the records must be described using the FIELDS parameter.

ADACMP Processing Considerations

Data Modifications

ADACMP modifies all input records as follows:

Fields defined with format U or P are checked to ensure that the field value is numeric and in the correct format.

If a value is null, it must contain characters which correspond to the format specified for the field:

Format Null Value
ALPHANUMERIC (A) Blanks (ASCII: hex `20' or EBCDIC: hex `40')
BINARY (B) Binary zeros
FIXED-POINT (F) Binary zeros
FLOATING POINT (G) Binary zeros
PACKED DECIMAL (P) Packed decimal 0
UNPACKED DECIMAL (U) Unpacked decimal 0, depending on the source architecture
UNICODE (W) Blanks depending on WCHARSET specified

For a packed or unpacked alphanumeric field, -0 is converted to +0

Any record which contains invalid data is written to the ADACMP error file and will not be written to the compressed file.

Data Compression

The value for each field is compressed (unless the FI option is specified) as follows:

  • Trailing blanks are removed for fields defined with A format;

  • Leading zeros are removed for fields defined with B, P or U format;

  • If the field is defined with the NU option and the value is a null value, a one-byte indicator is stored. Hexadecimal `C1' indicates that one empty field follows, `C2' two, etc.;

  • Empty fields located at the end of the record are not stored.

Example :

The following data definitions and corresponding values would be processed by ADACMP as shown in the following figure:

01,ID,4,B,DE      ; ID
01,BD,6,U,DE,NU   ; BIRTHDATE
01,SA,5,P         ; SALARY
01,DI,2,P,NU      ; DAYS ILL
01,FN,8,A,NU      ; FIRST_NAME
01,LN,9,A,NU      ; LAST_NAME
01,SE,1,A,FI      ; SEX
01,HO,7,A,NU      ; HOBBY
Field    Format  Before compression          After compression

ID         B     67 12 00 00                 03  67 12

BD         U     31 36 30 35 35 39           07  31 36 30 35 35 39

SA         P     00 00 05 00 0C              04  05 00 0C

DI         P     00 0C                      )                       
                                            )C2 (two empty fields)
FN         A     20 20 20 20 20 20 20 20    )    

LN         A     4E 41 4D 45 20 20 20 20 20  05  4E 41 4D 45

SE         A     4D                          4D

HO         A     20 20 20 20 20 20 20        C1 (one empty field)

ADAMUP Processing Considerations

When adding records to or deleting records from an ADABAS database file, entries have to be inserted/removed in the Address Converter (AC), Data Storage (DS) and in the index. The data storage space table (DSST) has to be modified accordingly.

Adding Records

ISN Assignment

If the USERISN option is set, the ISN given with the input data is used. If this ISN exceeds the current limit (MAXISN) for the file or has already been assigned to another record, ADAMUP terminates execution and returns an error message. As with an ADABAS N2 command, there is no automatic extension of the file's Address Converter. The file's first free ISN is set to a value that is one greater than the highest USERISN provided if there is a USERISN which is greater than or equal to the file's current first free ISN.

If the USERISN option is omitted or NOUSERISN is specified, the ISN of each record is assigned by ADAMUP. ISNs are assigned in ascending sequence. If ISN-reusage is enabled, ADAMUP first scans the file's Address Converter for unused ISNs. Once all ISNs have been reused or if ISN-reusage is disabled, ADAMUP assigns new ISNs starting at the file's first free ISN. Whenever a new Address Converter block is required, it is taken from the extents that are currently available. When these blocks are exhausted, an automatic extension is carried out according to the rules described in this chapter. Processing continues if the extension is successful, otherwise ADAMUP terminates with an error message.

ISNs deleted by a mass delete that is running in parallel can be reused immediately for the records being added.

Finding Space In Data Storage

If DS-reusage has been enabled, ADAMUP scans the DSST for a DS RABN with sufficient space to store the current data record. One DSST RABN is scanned at a time, just as the ADABAS nucleus does, and the first free DS RABN is used if no space is found via the DSST. When a mass delete is run in parallel, the DS RABNs from which records are deleted are reused first. This is different to the procedure used by the ADABAS nucleus, but saves scanning the DSST and minimizes the number of I/Os to the Data Storage. This is because those RABNs have to be read and written by the delete routines in any case.

If DS-reusage is disabled, or if no space is found via the DSST, ADAMUP assigns a new DS block starting at the first free DS RABN.

Whenever new records are added to a Data Storage block, the padding factor specified for the file is considered. If a new Data Storage block is required, it is taken from the extents that are currently available. When these blocks are exhausted, an automatic extension is carried out. Processing continues if the extension is successful, otherwise ADAMUP terminates with an error message.

Deleting Records

In the first step, all input records on the file that contains the ISNs to be deleted are read and validated. If any invalid records are found, the line number and offset are reported, and ADAMUP terminates execution and returns an error status once the input file has been parsed completely.

At the end of this step, ADAMUP builds a table of the ISNs to be deleted in virtual memory. This table is used in the next steps when performing the updates required on the file's Address Converter, Data Storage and index. The space required for this table (one bit per entry) depends on the lowest and highest ISN specified on the input file. ADAMUP terminates execution and returns an error message if there is not sufficient space.

In the second step, the file's Address Converter is processed. Because the ISNs to be deleted are pre-sorted, the number of Address Converter IOs can be reduced to a minimum in this step.

The corresponding Address Converter entry of each ISN specified is retrieved. For unused ISNs, an entry is written to the error log and processing continues if NOT_PRESENT=IGNORE is specified (default), otherwise ADAMUP terminates and an error message is returned. For ISNs that are used, the corresponding Data Storage RABN is put into the SORT and the Address Converter entry is deleted. Consecutive references to the same Data Storage RABN are skipped. Each Data Storage RABN put into the SORT is prefixed with the extent number to indicate its location in the File Control Block (FCB). This allows the next step to process the file's Data Storage according to the sequence in which the Data Storage extents were allocated.

At the end of this step, the first free ISN on the file is reset to the first ISN of the highest range of ISNs to be deleted, if ISN-reusage is enabled, and the highest ISN of the range of records to be deleted is identical to the last used ISN on the file.

In the third step, the file's Data Storage and Data Storage Space Table are processed. Because the Data Storage RABNs to be modified are now pre-sorted, the number of Data Storage and Data Storage Space Table IOs can be reduced to a minimum in this step.

The relevant Data Storage blocks are read using the values returned by the SORT. Within each block, the records identified by an ISN in the table of ISNs to be deleted are removed, the block is refilled with records to be added (when a mass add is run in parallel and DS reusage is enabled) and the Data Storage Space Table is modified accordingly. At the end of this step, the first free Data Storage RABN is reset to the start RABN of the last range of Data Storage RABNs from which all data were deleted, if DS reusage is enabled, and the end RABN is identical to the last used Data Storage RABN on the file.

Updating the Index

Once the Address Converter, Data Storage and Data Storage Space Table have been modified, ADAMUP copies the file's Normal Index (NI) to an intermediate file and resets the file's index extents. Index entries that correspond to deleted records are omitted in this step.

Loading the Normal and Main Index

In order to build the Normal Index and Main Index, the Descriptor Value Table (DVT) entries contained on the input file have to be read and sorted according to ascending descriptor values and ISNs. The output of this sort is merged with the Normal Index entries saved on the intermediate file, and is then used to build the new Normal Index and Main Index.

Descriptors defined with the unique option are checked to ensure that the new Normal Index contains only one ISN per descriptor value. If there is more than one ISN, the conflicting ISNs are written to the error log, the unique flag is reset in the FDT and processing continues if UQ_CONFLICT=RESET is specified. Otherwise ADAMUP terminates with an error message.

Besides sorting the descriptor values, reading the Descriptor Value Tables is very time-consuming as a result of the large number of I/Os to the sequential input file. Therefore, if there are many descriptors, ADAMUP attempts to minimize the number of passes required to read through the Descriptor Value Tables by using the information contained in the Descriptor Space Summary (DSS). During each pass through the Descriptor Value Tables, the values for one descriptor are directly given to the SORT. The values of additional descriptors, if they exist, are written to the TEMP data set. The greater the number of descriptors using the TEMP in parallel during each pass, the faster this step will be. ADAMUP displays the total number of passes required at the end of the run.

All index blocks are filled in accordance with the padding factor specified when the file was loaded. Whenever a new index block is required, it is taken from the existing extents (which have been reset at the start of this step). If these blocks are exhausted, an automatic extension is carried out. Processing continues if the extension is successful, otherwise ADAMUP terminates with an error message.

Loading the Upper Index

Whereas the Normal Index and Main Index are organized on a descriptor-by-descriptor basis, the Upper Index, index level 3 and higher, contains all descriptors. In order to link in the new Main Index, an entry is made in the Upper Index for each new Main Index block. The whole Upper Index is rebuilt. The padding factor specified when loading the file is re-established. All pre-allocated blocks are used before additional blocks are allocated. If additional blocks are required, the procedure as described for Normal Index and Main Index loading is used.

Rejected Data

Any rejected data is written to the ADAMUP error file. The contents of this error file should be displayed using the ADAERR utility. Do not print the error file using the standard operating system print utilities, since the records contain unprintable characters.

Please refer to the ADAERR utility in the Utilities Manual for further information.

ADABCK Processing Considerations

The DUMP/EXU_DUMP Function

When dumping a complete database (DUMP=*), the database's global information and all loaded files are dumped to an ADABAS backup copy. Therefore, a database can be restored from a database backup copy. Single files contained in such ADABAS backup copies can also be restored.

Dumping only selected files allows a controlled backup of certain parts of a database in cases where backing up the complete database is unnecessary.

The DUMP/EXU_DUMP function may be used when the nucleus is active or inactive. If the nucleus is active during a DUMP, all updates are dumped to the backup copy.

The DUMP/EXU_DUMP function cannot be used when AUTORESTART is pending. Then first the nucleus has to be started to resolve the AUTORESTART pending situation.

When the DUMP is about to terminate, all transactions have to be synchronized on ET status. An active nucleus does this automatically on request of ADABCK. During synchronization, the nucleus will only schedule commands which

  • enable ET users to attain ET status;

  • complete any active update commands;

  • are read/search commands.

The nucleus may come up while the DUMP function is running. In this case, the nucleus and the DUMP function will synchronize with each other. The nucleus can be shut down with ADAOPR CANCEL while the DUMP function is active. If the nucleus terminates abnormally, ADABCK displays a message requesting the nucleus to be started. Then it waits until the nucleus performs its autorestart, after which it terminates normally.

Parallel Backups

Sometimes it can be useful to dump single files in parallel using multiple ADABCK jobs. This is generally possible with EXU_DUMP, but if the nucleus is active, only one DUMP function is permitted.

Note:
Parallel backups are not supported on Windows platforms.

The RESTORE/OVERLAY Function

A backup copy can be used to restore/overlay either selected files or a database if single files or the database's global information is corrupted.

When restoring/overlaying files, the nucleus may be either active or inactive. A check is made that all of the RABNs required by the files to be restored/overlaid are available. If all RABNs are available, the file is restored to the same position as before. If one or more of the required RABNs are not available in the database, a completely new set of RABNs will be allocated.

The nucleus may not be active when restoring/overlaying a database, since exclusive control over the database container files is required.

When restoring/overlaying a complete database, the underlying database may be larger, containing more blocks or more containers than the backup save set. However, the block sizes covered by the save set must be identical. The unused blocks from the underlying database will be kept and their space will be returned to the free space table.

When restoring/overlaying files, the underlying database can be smaller or larger than the backup copy.

When restoring/overlaying files, ADABCK tries by default to restore the blocks to the original block numbers. If this space is not available because it is occupied by another file, the file will be completely restored to other block numbers, and an attempt is made to combine several file extents into one.

Parallel Restores

Sometimes it can be useful to restore single files in parallel using multiple ADABCK jobs. This is possible with both the RESTORE and the OVERLAY function, regardless of whether the nucleus is active or inactive.

Security File Considerations

When restoring/overlaying the security file, only the passwords and the associated permission levels are re-established; the protection levels of the files loaded are not re-established. Therefore, if the file is restored to a newly-formatted database, the protection levels have to be reenabled using the ADASCR security utility.

The protection levels of all files are only re-established if a database is restored/overlaid.

ADABCK Restart Considerations

ADABCK has no restart capability. An abnormally-terminated ADABCK execution must be rerun from the beginning.

An interrupted RESTORE/OVERLAY of one or more files will result in lost RABNs which can be recovered by executing the RECOVER function of the utility ADADBM. An interrupted RESTORE/OVERLAY of a database results in a database that cannot be accessed.

ADAORD Processing Considerations

Exporting Files

When exporting one or more files, ADAORD copies the content of each file's Data Storage together with the information required to re-establish its index to a sequential output file (ORDEXP). Exporting a file's data records is identical to unloading them, and ADAORD supports the same processing sequences as the ADAULD utility. There are, however, differences in the way in which the information required to re-establish the file's index is provided. ADAORD does not generate descriptor value table (DVT) entries based on the data records (like ADAULD), but rather retrieves and exports the file's inverted lists. This requires access to a valid index and results in additional I/Os on the one hand, while saving CPU time on the other.

All files to be processed are written to a single sequential output file (ORDEXP) in ascending file number sequence. Splitting the export into separate runs and thus creating several versions of the sequential output file should be considered if non-default allocation quantities or placements are to be used when subsequently re-importing a file. If non-default values and placements are used, each file requires a separate run, and splitting the export procedure helps prevent lengthy and time-consuming positioning during the re-import process.

Importing Files

When importing one or more files, ADAORD retrieves the information contained on the sequential input file (ORDEXP) to re-establish each file's Data Storage, Address Converter and index. Importing a file's data records and building the Address Converter is identical to loading them using the utility ADAMUP (with the USERISN option). However, the process of building the file's index is faster in ADAORD because the descriptor values and ISNs are provided in their correct sequence. This eliminates the necessity of sorting (and of using the SORT and TEMP files) and more than compensates for the additional expenditure that results from reading through the index during the EXPORT phase.

The format of the sequential input file (ORDEXP) is independent of any database device types. Therefore, the process of exporting and then re-importing can be used to migrate files between databases that reside on different device types.

When importing the security file, only the passwords and the associated permission levels are reestablished; the protection levels of the files imported are not reestablished. Therefore, if a file is imported to a newly-formatted database, the protection levels have to be re-enabled using the utility ADASCR (refer to the Utilities Manual for further information).

Allocating Space

When importing a file, both the placement and initial allocation quantities can be controlled by the user or left to ADAORD.

Unless positioning is forced by the specification of a start RABN, ADAORD will use the following sequence for the initial allocation of a file's extents: Address Converter (AC), Upper Index (UI), Normal Index (NI) and Data Storage (DS).

This allows the two extent types with the highest probability of being exhausted (NI and DS) to be extended without breaking into another extent.

If the number of blocks or megabytes to be allocated is omitted, ADAORD calculates the allocation quantity as follows:

ALQN = ALQO * (100 - PFACO) / (100 - PFACN)

where:

ALQN New allocation quantity in blocks or megabytes
ALQO Old allocation quantity in blocks or megabytes
PFACN New padding factor
PFACO Old padding factor

By default, the initial and all subsequent allocations will be made using a contiguous-best-try method.

ISN Assignment

The ISN provided with each data record (and also contained in the inverted lists) is used. ADAORD will terminate execution and return an error message if the limit (MAXISN) for a file has been decreased to a value less than the file's first free ISN and an ISN that exceeds the new limit is found. The file's new first free ISN is set to a value one greater than the highest ISN found in the data records.

In order to change the ISN assignment, the file has to be unloaded using ADAULD and then reloaded using ADAMUP.

Reordering a Database

This function consists of implicit EXPORT and IMPORT functions.

When reordering at the database level, all of the files in the database have to be exported in the first step. A single version of ORDEXP will be created, independently of where it physically resides.

The second step consists of rearranging the database's FCB and FDT area and reallocating the DSST behind it.

The final step is to re-import the files. Each file is relocated, multiple logical extents are condensed into a single logical extent and the padding factors are reestablished.

The created sequential file (ORDEXP) will not be deleted at the end of this function.

Repairing an Inconsistent Index

Because the new index is based on the content of the old index (and not on the file's data records), an index which is logically inconsistent cannot be repaired by exporting and re-importing the file. Furthermore, an index which is physically corrupted may cause ADAORD's EXPORT function to loop or terminate abnormally.

The index can only be repaired by either reinverting (using ADAINV) or unloading and reloading the file (using ADAULD and ADAMUP).

File Space Estimation

This section contains formulas for calculating the Associator and Data Storage space requirements for a file.

Getting a First Estimate

The following pages of this chapter describe how to get a reasonably accurate estimate of the disk space requirements for your file or database before you load the data. A simple way of getting a first approximation is to load a small amount of your data, for example 1%-2%, into the database, then run the ADAREP utility and check the figures output for "allocated" and "unused" blocks. Then extrapolate these figures to calculate how much space would be required for the full 100% of the data. This is the approach often used by experienced database administrators at customer sites to calculate space requirements.

Associator Space Estimation

The Associator space required for a file is the sum of the space requirements for the following Associator elements:

  1. Normal Index

    The Normal Index is the lowest level of the index structure. It contains the inverted lists. Each inverted list is composed of a descriptor value and the list of ISNs of all the records that refer to this descriptor value.

  2. Upper Index

    The Upper Index consists of the Main Index and the other upper index levels. The Main Index is the next-highest level of the index structure after the Normal Index. It is used to manage the Normal Index. Up to this level, each index block may contain entries for only one descriptor.

    The Upper Index (index levels 3 and higher) contains entries for all descriptors that are present. Level 3 is used to manage the Main Index. As long as there is more than one Upper Index block at the current level, more levels will be added, each level managing the level below.

  3. Address Converter

    The Address Converter consists of a table of RABNs, each of which indicates the Data Storage location of the record identified by a given ISN.

Normal Index Space Estimation

The space required for the Normal Index depends on the number and the characteristics of the descriptors contained in the file.

An estimate of the Normal Index space required for each descriptor can be made using the formula:

NIBY = (IL * UV * MAXISN) + DV * (L + 2)

where

NIBY

Normal Index space requirement (in bytes).

UV

The average number of unique values in each record for the descriptor.
If the descriptor is not defined with the MU option, UV is equal to or less than 1.
If the descriptor is defined with the NU option, UV is equal to the average number of values per record minus the percentage of records containing a null value. For example, if the average number of values per record is 1 and 20 percent of the values are null, UV is equal to 1 - 0.2 = 0.8.

MAXISN

The number of records permitted for the file (see MAXISN parameter of the utility ADAFDU).

DV

The number of different values of the descriptor in the file.

L

The average length of each different value of the descriptor. If the descriptor is not defined with the FI option, L is equal to the average length. If the descriptor is defined with the FI option, L is equal to the standard length of the descriptor.

IL

IL ISN size of 2 or 4 bytes.

The factor IL*UV*MAXISN represents the space required to store the ISNs, and the DV*L factor represents the space required to store the descriptor values.

For descriptors with numerous duplicate values, the factor IL*UV*MAXISN is the important factor. For descriptors with a large proportion of unique values, DV*L is the important factor.

This is only valid if the data is loaded using the mass update utility ADAMUP or if the index is created with the inverted list utility ADAINV. If the data is loaded using S1 calls, twice as much space may be required (in the worst case), and the blocks are not filled completely. New values must be added to a block in sort sequence. If there is not enough space available in a block, in index block is split.

Example 1: Calculating bytes

Descriptor AA has an average of 1 value in each record. There are 50 different values for AA in the file. There are no null values for AA. The average value length is 3 bytes. The MAXISN setting for the file is 20000, the ISN size is 2 bytes.

Field Definition: 01,AA,5,U,DE
NI = (2 * 1 * 20,000) + 50*(3 + 2)
NI = 40,000 + 250
NI = 40,250 bytes

Example 2: Calculating bytes

Descriptor BB has an average of 1 value in each record. There are 20000 different values for BB in the file. There are no null values for BB. The average value length is 10 bytes. The MAXISN setting for the file is 20000, the ISN size is 4 bytes.

Field Definition: 01,BB,15,A,DE
NI = (4 * 1 * 20,000) + 20,000*(10 + 2)
NI = 80,000 + 240,000
NI = 320,000 bytes

Example 3: Calculating bytes

Descriptor CC is a multiple-value field with an average of 10 values in each record. There are approximately 300 different values for CC in the file. The average value length is 4 bytes. There is an average of 3 null values in each record. The MAXISN setting for the file is 20000, the ISN size is 4 bytes.

Field Definition: 01,CC,12,A,DE,MU,NU
NI = (4 * 7 * 20,000) + 300*(4 + 2)
NI = 560,000 + 1,800
NI = 561,800 bytes

Example 4: Calculating bytes

Descriptor DD is a field within a periodic group. Each record has an average of 5 values for DD. There are 10 different values for DD in the file. Each record has an average of 3 null values. The MAXISN setting for the file is 20000. The average value length is 5 bytes, the ISN size is 2 bytes.

Field Definition: 01,PX
                  02,DD,8,A,NU
NI = (2 * 2 * 20,000) + 10*(5 + 2)
NI = 80,000 + 70
NI = 80,070 bytes

Once the number of bytes required for the Normal Index has been determined, an estimate of the number of blocks required can be made using the following formula:

NIBL = NIBY / (BL * (1 - p / 100) - 3)

where

NIBL

NI space requirement in blocks

NIBY

NI space requirement in bytes

BL

Associator block length

p

Associator block padding factor

The result of the division should be rounded up to the next integer.

Example 5 : Calculating blocks

NI requirement in bytes = 60,250
Device type 2 KB
Associator block padding factor = 10 percent
NIBL = 60,250 / (2048 * (1 - 10 / 100))
NIBL = 32+ = 33 blocks

Upper Index Space Estimation

The Upper Index consists of the Main Index and other upper index levels. Each Normal Index representation in the Main Index consists of a 9 byte fixed part and the descriptor value. The Main Index space requirement for each descriptor may be calculated using the formula:

MIBY   = NIBL * (L + 9)

where

MIBY

Main Index space requirement (in bytes)

NIBL

Normal Index space requirement (in blocks)

L

The average length of each different value of the descriptor. If the descriptor is not defined with the FI option, L is equal to the average length. If the descriptor is defined with the FI option, L is equal to the standard length of the descriptor. For fields with format A and W, the length of truncated descriptor values must be considered; the descriptor values are truncated at the first byte where they differ from the previous descriptor value.

Example 1: Calculating bytes

NI Block Requirement = 45 blocks
MI   = 45 * (3 + 9)
MI   = 540 bytes

The following formula may be used to convert the Main Index byte requirement to blocks:

MIBL = MIBY / (BL * (1 - P/ 100))

where

MIBL

Main Index space requirement (in blocks)

MIBY

Main Index space requirement (in bytes)

BL

Associator block length

p

Associator block padding factor

The result of the division is rounded up to the next integer.

Example 2: Calculating blocks

MI byte requirement = 540 bytes
Device type 2 KB
Associator block padding factor = 5 percent
MIBL = 540 / (2048 * (1 - 5 / 100))
MIBL = 0+ = 1 block

Overall Space Requirements

The highest upper index levels (level 3 and higher) contain entries for all descriptors of a file. The overall space requirements for the upper index can be obtained using the following formula:

UIBL = M * (1 + C + C**2 + C**3 + ... + C**13)

where

UIBL

Upper index space requirement in blocks

M

Sum of the Main Index space requirements for all descriptors of the file

C is given by the following formula:

C = (L + 13) / (BL * (1 - P/100))

where

L

Average length of all values of all descriptors of the file

BL

Associator block length

p

Associator block padding factor

Address Converter Space Estimation

The Address Converter for a file consists of a list of the relative ADABAS block numbers (RABNs), each of which represents the Data Storage block number in which a given record is stored. The block numbers are stored in ISN sequence, with the nth entry containing the Data Storage RABN for ISN n. Three bytes are required for each entry.

The space requirement for the Address Converter can be calculated using the formula:

AC = MAXISN * 3 / BL

where

AC

Address Converter space requirement (in blocks)

MAXISN

MAXISN setting for the file

BL

Associator block size

The result of the division is rounded up to the next integer.

Example:

MAXISN = 2,000,000
Device type 2 KB
AC     = 2,000,000 * 3 / 2048
AC     = 6,000,000 / 2048
AC     = 2929+ = 2930 blocks

Data Storage Space Estimation

The Data Storage space requirement can be estimated using the formula:

DS = N/(BW/L) + 1

where

DS

Data Storage space requirement (in blocks)

N

Number of records to be loaded into the file

B

Data Storage block size

p

Data Storage block padding factor

BW

Real amount of space used (minus padding factor) (B*(1-p/100))

L

Average record length

Example:

Number of records = 1,000,000
Average compressed record length = 50
Device type = 4 KB
Data Storage block padding factor = 5 percent
BW = 4096 * (1 - 5/100) = 3891
DS = 1,000,000/(3891/50) + 1 = 12,988 blocks