Container Files

This document covers the following topics:


General

Container files are disk files created by Adabas utilities. They are managed by the Adabas nucleus and Adabas utilities. The internal structure of these files is organized and maintained by Adabas, thus permitting the use of very efficient disk usage algorithms.

The data in the container files consists of data blocks with a block size that is defined by the creator of the database. All of the data blocks of each container type are addressed via a so-called relative adabas block number (RABN), which is a 4-byte unsigned integer >0. Therefore an Adabas database can contain up to 232 - 1 blocks of each container type. The term RABN is used not only for the block number, but also for the corresponding block.

The required container files of an Adabas database are called ASSO, DATA and WORK. For some utilities, additional container files called SORT and TEMP are required.

Associator (ASSO)

ASSO contains the organizational data of the database and of the files in the database. Examples of the data stored in ASSO are:

  • a summary of the physical and logical layout of the database.

  • a list of the used and unused blocks of the database.

  • a description of the record fields of each file.

  • lists of descriptor (search key) fields, which are used for non-sequential database search operations.

  • protection mechanisms for using the Adabas utilities when the database is offline.

Data Storage (DATA)

Data Storage (also referred to as simply DATA) contains the user data of a database. In order to reduce disk space requirements, Adabas uses a data compression technique. This means that user data is converted into a more compact form before being stored in DATA, thus significantly reducing storage requirements and disk I/O.

WORK

The Adabas nucleus uses WORK as a temporary storage area for update log information required for backout transaction and auto restart.

The size of the WORK should be chosen such that the following applies at all times: consider all of the update, delete and store operations performed since the start of the oldest transaction that is currently active - then the size of the WORK should be equal to or greater than

  • (the size of all old compressed records modified or deleted

  • + the size of all new compressed records after modification or insertion

  • + the size of all old index values modified or deleted

  • + the size of all new index values after modification or insertion)

multiplied by 4.

Note:
Databases with LOB data may imply significantly larger WORK sizes because the size of the LOB data also has to be taken into account (for updated records, only the size of the LOB values which are updated). If a database contains LOB data, a WORK block size of 4KB is recommended.

SORT, TEMP

These are used by some Adabas utilities as temporary storage areas and work areas. In addition to the predefined SORT and TEMP containers, Adabas also uses temporary files created by the nucleus or utilities as work space, these files being deleted after usage. Refer to the section Temporary Working Space for further information.

Adabas Logical Extents

An Adabas logical extent is a group of consecutive RABNs allocated by the Adabas nucleus or an Adabas utility.

For each file loaded into the database, at least one of each of the following types of Adabas logical extents is allocated to the file:

  • Data Storage logical extent

    (allocated from the Data Storage physical extent);

  • Address Converter logical extent

    (allocated from the Associator physical extent);

  • Normal Index logical extent

    (allocated from the Associator physical extent);

  • Upper Index logical extent

    (allocated from the Associator physical extent).

Additional logical extents are allocated by the Adabas nucleus or an Adabas utility when additional space is needed as a result of file updating.

Adabas Physical Extents

The datasets ASSO, DATA, WORK, SORT and TEMP can consist of several extents, i.e. physically separate areas of storage on disk or other secondary storage medium. When a utility references any of these extents, it uses environment variables to do so. The environment variables are called ASSO1, ASSO2 etc. for the ASSO dataset, DATA1, DATA2 etc. for the DATA dataset and so on for WORK, SORT and TEMP. Thus, for example, if a utility requires to access the ASSO dataset which has three extents, the environment variables ASSO1, ASSO2 and ASSO3 must point to these extents.

The search strategy for finding the ASSO, DATA and WORK container extents is as follows:

  1. Check for the environment variables ASSO1, ASSO2 etc. for ASSO, DATA1, DATA2 etc. for DATA and WORK1 for WORK. If such an environment variable exists, it must contain the file name of the corresponding container extent.

  2. Search for the corresponding entries in the DBnnn.INI file. If such an entry exists, it must contain the file name of the container extent. Refer to the Adabas Extended Operation documentation for further information about the DBnnn.INI files.

  3. Search for the file CONTx.nnn in the database directory (UNIX: $ADADATADIR/dbnnn, Windows: %ADADATADIR%\dbnnn), where CONT is ASSO, DATA or WORK, x is the extent number and nnn is the 3 digit database ID.

The search stategy for using SORT and TEMP is described in the section Temporary Working Space

The maximum number of ASSO extents is given by (ASSO1 blocksize - 2) / 12. The maximum the number of DATA extents is given by (ASSO1 blocksize*3 - 2) / 12. These values can, however, be reduced under the circumstances described below.

The total number of ASSO and DATA extents cannot exceed 2721. This maximum number reduces by 1 each time any two adjacent DATA extents have a different block size. So the formula is:

ASSO Extents + DATA Extents + (number of different adjacent DATA block sizes) <= 2721.

Thus, for example, you could have a database where there is only 1 ASSO extent and 1360 DATA extents where no two adjacent DATA extents have the same block size, giving a total of 1 ASSO extent + 1360 DATA extents + 1359 changes of DATA block size= 2720.

The following table gives some examples of the correspondence between the size of the container file ASSO1 and the number of ASSO and DATA extents allowed. The entries in the column "best case" show the maximum number of DATA extents allowed if all of the DATA extents have the same block size. The entries in the column "worst case" show the maximum number of DATA extents allowed if no two adjacent DATA extents have the same block size.

ASSO1 blocksize       max. number of       max number of DATA extents
                      ASSO extents         best case        worst case
-----------------------------------------------------------------------
2 KB (2048 bytes)       170                 511               511
3 KB (3072 bytes)       255                 767               767
4 KB (4096 bytes)       341                1023              1023 
5 KB (5120 bytes)       426                1279              1148
6 KB (6144 bytes)       511                1535              1105
7 KB (7168 bytes)       597                1971              1062
8 KB (8192 bytes)       682                2047              1020

SORT can have up to 50 extents: SORT1, SORT2, ... ,SORT50

WORK can have only 1 extent: WORK1.

TEMP can have up to 10 extents: TEMP1, TEMP2, ... ,TEMP10.

Effect of large buffer sizes for PLOG and WORK

If you specify a WORK block size of 8K or less, Adabas will set the PLOG block size to 8K. If you specify a WORK block size larger than 8K, Adabas will set the PLOG block size to 32K.

To ensure that all completed transactions can be re-applied during a database recovery, the PLOG buffer is flushed to the PLOG after each ET command, regardless of whether the PLOG buffer is full or not. Each subsequent ET command causes the current PLOG block to be re-written, as long as the PLOG buffer is not full. A new PLOG block is only started when the PLOG buffer is full. Similarly, to ensure data consistency after an autorestart, the current WORK part 1 block is re-written after each ET command until the WORK part 1 buffer is full.

In general, if you have large PLOG and WORK block sizes, more transactions are required to fill the PLOG buffer and WORK part 1 buffer than with small block sizes. This means that the average size of the I/O transfers is increased, but the total number of I/O transfers due to ET commands is unchanged.

For this reason we recommend you to use a WORK block size of 8K or less if your compressed data records do not exceed 8K, and therefore a PLOG block size of 8K.

Access Methods for Container Files

Adabas offers two methods for creating and accessing database container files:

  • Device type independent access method

  • Device type dependent access method

Device type independent access method

With the device type independent access method, Adabas requests the operating system to create the container file using the "contiguous best try" option. The Adabas blocks are written contiguously, regardless of the physical device characteristics. You select the device type independent access method for a given container file by specifying the size of the container file in megabytes when you create it.

With modern hardware (RAID systems, variable track size disks, storage servers, etc.) the track size returned by the system is an arbitrary number and bears no relation to the physical characteristics of the disk. In this case you should use the device type independent access method.

The device type independent access method is always used to access the SORT and TEMP container files.

Device type dependent access method

You select the device type dependent access method for a given container file by creating the container file as a number of contiguous cylinders that start on a cylinder boundary. The values sectors/track and tracks/cylinder that the system returns as device information are used and a cluster size that allows the allocation of one single cylinder is required. In other words, the number of sectors per cylinder must be a multiple of the cluster size.

When a block is written to a container file with this access method, Adabas ensures that the block does not span track boundaries. If the track size is not a multiple of the Adabas block size, the end of the track will not be used. This allows Adabas blocks to be read with a single disk revolution.

Adabas Block Sizes

If you use the device type independent access method, you should select block sizes for the DATA and WORK container files that are a multiple of the ASSO block size. This minimizes the temporary unused space in the Adabas buffer pool when replacing blocks of different container file types.

With this rule, different combinations of block sizes are possible.

Examples:

ASSO DATA WORK
3 K 6 K 6 K
2.5 K 7.5 K 7.5 K
2 K 4 K 8 K

We recommend you to use this rule also if you use the device type dependent access method. When you select the block sizes for use with this method, you should also take into account the number of sectors per track, so that the unused space at the end of the track is not too large.

Example:

If the disk has 62 sectors per track (i.e. the track size is 31K), the following table shows how much unused space there is per track, depending on the block size you choose for the container file.

ASSO/DATA/WORK Blocksize Adabas blocks per track unused space at end of track
4 K 7 3 K
8 K 3 7 K
3 K 10 1 K
6 K 5 1 K

Effect of large buffer sizes for PLOG and WORK

If you specify a WORK block size of 8K or less, Adabas will set the PLOG block size to 8K. If you specify a WORK block size larger than 8K, Adabas will set the PLOG block size to 32K.

To ensure that all completed transactions can be re-applied during a database recovery, the PLOG buffer is flushed to the PLOG after each ET command, regardless of whether the PLOG buffer is full or not. Each subsequent ET command causes the current PLOG block to be re-written, as long as the PLOG buffer is not full. A new PLOG block is only started when the PLOG buffer is full. Similarly, to ensure data consistency after an autorestart, the current WORK part 1 block is re-written after each ET command until the WORK part 1 buffer is full.

In general, if you have large PLOG and WORK block sizes, more transactions are required to fill the PLOG buffer and WORK part 1 buffer than with small block sizes. This means that the average size of the I/O transfers is increased, but the total number of I/O transfers due to ET commands is unchanged.

For this reason we recommend you to use a WORK block size of 8K or less if your compressed data records do not exceed 8K, and therefore a PLOG block size of 8K.

Database Auto Expand

If a database becomes full, Adabas is able to auto expand the database containers ASSO and DATA. The prerequsite for this is that the nucleus parameter OPTIONS=AUTO_EXPAND has been specified. The strategy used to allocate new space is as follows:

  1. Try to increase the last extent of the container that requires new space. This is only possible if the extent has the same block size as required for the new space in the container.

  2. Check whether there is an environment variable for the next container extent. If the environment variable exists, it must contain the file name for the next extent, and the specified location must have enough space available for the new container extent.

  3. Check whether the DBnnn.INI files contain entries in the section RESERVED_LOCATIONS. If they do, try to allocate the new container extent in one of the specified locations. Refer to the Adabas Extended Operations documentation for further information about the DBnnn.INI files. The file name for the new container extent will be CONTx.nnn, where CONT is ASSO or DATA, x is the extent number and nnn is the 3 digit database ID.

  4. Try to allocate the new container extent in the database directory (UNIX: $ADADATADIR/dbnnn, Windows: %ADADATADIR%\dbnnn),. The file name for the new container extent will be CONTx.nnn, where CONT is ASSO or DATA, x is the extent number and nnn is the 3 digit database ID.

Notes:

  1. Utilities auto expand the database only in online mode when the nucleus is active. An exception to this is ADABCK, where the database can also be expanded in offline mode.
  2. If the auto expand is to be done in a file system, a file with the same name must not already exist. If the auto expand is to be done into a raw section, the raw section must not already contain a container of this type with the same extent number and database ID. It does not matter whether the container extent has only been allocated with ADADEV, or if the container has really been included in a database.
  3. If you specify explicit RABNs for space allocations, no auto expand will be performed if the database does not contain all of the requested RABNs.

Index Block Sizes

When Adabas creates index blocks, it allocates blocks with a block size that depends on the descriptor value sizes:

  • Large descriptor values >253 bytes are stored in large index blocks with a block size >= 16 KB.

  • Smaller descriptor values are stored in small index blocks with a block size < 16 KB.

If you want to store large descriptor value, you must, therefore, define an ASSO container with a large block size for the database.

SORT Data Set Placement

It is recommended that the SORT data set does not reside on the same volume as Associator and DATA. When processing a file which contains more than 100 000 records, the SORT area should be split across two volumes to minimize disk arm movement.

The SORT data set may be omitted when processing only small amounts of data (e.g. when inverting a field in an empty file). The Adabas utility being used then performs an in-core sort.

The SORT data set must be large enough to sort the largest descriptor to be processed. Check the ADACMP or ADAULD log for a list of descriptors, as well as a recommended size of SORT and TEMP for any future data compression or decompression operations.

The ADAINV SUMMARY function also displays the required SORT and LWP size for a memory-resident sort.

Note:
If you want to force ADAINV to do a memory-resident sort, do not specify a SORT data set, since otherwise ADAINV might do a file-based sort for the first descriptor, even if the LWP parameter is large enough for a memory-resident sort. This is because ADAINV does not know in advance the size of the descriptor. The subsequent descriptors will always be processed in memory if possible.

TEMP Data Set Placement

It is recommended that the TEMP data set does not reside on the same volume as DATA and SORT.

The TEMP data set is used while

  • loading Normal and Main Index;

  • updating Upper Index

Although the size of TEMP is closely related to the system performance when loading the Normal/Main Index, successful execution does not depend on a given size. When updating the Upper Index, however, all data required must fit into the TEMP data set.

ADACMP and ADAULD display the recommended TEMP size in the descriptor summary.

The TEMP data set is used for intermediate storage of descriptor values if more than one descriptor is inverted.

Although the size of TEMP is closely related to the performance when loading the Normal/Main Index, successful execution does not depend on a given size or the presence of a TEMP. It is recommended that the TEMP data set should be at least large enough to store the second largest descriptor. If you increase the size of the TEMP data set, the number of passes (i.e. the number of times the DATA area of the processed file is read) can be reduced. The ADAINV/ADAMUP SUMMARY function displays the recommended sizes for the TEMP data set.

Container Files in File System or Raw Device

You can create the Adabas container files either in a file system or on raw devices (UNIX only).

The following points should be considered:

  • In general, it is not possible to say whether containers in a file system or containers on raw device are better; this very much depends on the way Adabas is used. A file system has the advantage that it can buffer data, which means that a file system I/O does not necessarily result in a disk I/O, and a file system may optimize the I/O operations. But on the other hand, the file system also means an overhead that is avoided on raw devices. Software AG therefore recommends that you to try both and use the I/O system which delivers the best performance in the given environment.

  • Raw devices are limited to 2 terabytes.

    Warning:
    Adabas does not check to see whether a raw device is ≤ 2 terabytes, but if you use larger raw devices, unexpected errors can occur.
  • If you want to create containers larger than 2 terabytes, you must create them in a file system.

  • If you use containers in a file system and want to have a behaviour similar to that of containers on raw device, it is recommended that you use the ADANUC parameter UNBUFFERED.

  • Adabas containers can be created on local disks or on remote storage servers.

  • If you use disks on storage servers, the I/O speed may be limited by the speed of the network between your computer where Adabas is running and the storage server; this may decrease the overall performance of Adabas.

  • Some file systems that support snapshots of the file system do not overwrite updated blocks, but write a copy to a different location. If there are a lot of updates to the database, the resulting fragmentation of the data may lead to a very poor I/O performance. Software AG recommends that you ask the storage-system vendor if this can happen with his storage system, and what can be done to avoid these problems.

  • If the buffer pool is large enough (ADANUC parameter LBP), I/O performance is normally not critical; this is because most logical I/Os do not require a physical I/O. However, the performance of the devices that contain WORK and PLOG (Adabas protection log) is important, since WORK and PLOG contain log information that is required to guarantee database integrity. For this reason, an ET (end of transaction) command can only be completed when the log information is safely stored on the WORK and PLOG devices. Software AG therefore recommends the use of very fast storage devices if you have a high update load; we have seen performance improvements of up to 30% in cases where the normal storage device was replaced by a faster one for WORK and PLOG. Of course, the performance improvement depends very much on the mix of Adabas commands issued and the speed difference between the different storage devices.