Version 7.4.4
 —  DBA Tasks  —

Universal Encoding Support (UES)

Note:
UES support requires that you use a version 7 or above Adabas SVC or router.

This document covers the following topics:


Overview

The Universal Encoding Support (UES) is a database option that enables Adabas to

Data conversion needs arise when communicating with different systems, i.e., conversion between different code pages for alphanumeric data or conversion of numerical data due to different machine architectures (see also section Multiple Platform Support).

Wide character encodings are used in Asian language environments. Due to the need for a large number of different characters, non-single-byte character sets have been defined. In addition, Unicode, a "Universal Character Set", is more frequently used (see also section Wide Character Encodings).

A frequently listed internationalization task is searching and sorting data in a language specific order rather than binary order as defined by the encoding (see also section Collation Descriptor Exits in User Exits.

Top of page

Wide-Character Encodings

In most cases, an Asian text character cannot be encoded using a single byte. For example, Japanese with more than 10,000 characters in its set is encoded using two or more bytes per character. Because of the encoding required, these are called double-byte character sets (DBCS) or multiple-byte character sets (MBCS) as opposed to the single-byte character sets (SBCS) characteristic of most Western languages.

Previous versions of Adabas have stored DBCS-encoded data in alphanumeric fields. Problems with this solution include the following:

Although version 7 of Adabas continues to support the storage of DBCS-encoded data in alphanumeric fields, it introduces a wide-character (W) field format to store data with a well defined encoding and character set.

The default encoding for Wide format is Unicode for both storage and user. This default can be changed on user and storage level to the encoding appropriate for the intended usage.

In the figure below, the Japanese kana (first two) and kanji (second two) characters are encoded in mainframe modal (mixed) and non-modal (pure)

and in Unicode, a fixed 2-byte encoding that is more universal than the other encodings and is used as the default encoding in Adabas.

graphics/wide_char_encoding.png

Wide-Character Encoding Example

Modal encodings shift back and forth between single- and double-byte character encodings. Mixed DBCS strings always start and end in single-byte mode.

Double-byte character only field lengths must be an even number of bytes.

For EBCDIC encodings, the padding or blank character is X'40' or X'4040'. On Hitachi machines, the wide space is X'A1A1' and the single byte space is X'40'. Adabas allows a single byte space to appear in double-byte mode without a mode switch.

Top of page

Wide-Character Data Support

Adabas supports wide-character data with

For an existing database or file, the encoding is assigned to alpha or wide fields using the ADADBS utility without an unload/reload. The field-level option NV (pass a field unconverted to/from a caller) is available.

Extended Alphanumeric Fields

Adabas extends alphanumeric fields to support wide-character data by defining encoding keys on both the database and file levels: the file level encoding takes precedence over the database encoding. The encoding specifies the format in which the data is to be stored. It is also used as the default format in which data is exchanged with a local user.

The encoding must be compatible with EBCDIC; that is, the space character must be X'40'. For internal processing reasons, only one of the following encoding "families" is supported for a given file:

Advantages and Disadvantages

The advantages of using extended alphanumeric fields include

The disadvantage is that DBCS is not a "universal" encoding and unlike Unicode, it does not support all characters used in the world's languages.

Limitations

For an application, all alphanumeric fields have the same encoding. It is not possible to use different encodings for different fields in the same session.

Conversion Considerations

When converting from pure single-byte character encodings, the field length of variable fields may change requiring a shift of the converted record.

Wide-Character Fields

Adabas defines a wide-character (W) format for fields. W format fields are similar to alphanumeric (A) format fields in that encoding keys are defined on both the database and file levels: the file encoding takes precedence over the database encoding. It differs from A field encoding in that

A descriptor is stored (and sorted) with internal encoding.

Advantages and Disadvantages

The advantages of using wide-character (W) fields include the following:

The disadvantages are that

Limitations

Special DBCS Format Conversion Rules

To ensure a smooth transition from existing applications that use mixed-DBCS and DBCS-only data, special format conversion rules have been defined:

  1. A modal DBCS encoding comprising the superset of single-byte and double-byte characters is treated as "mixed-DBCS" encoding for alphanumeric fields and as "DBCS-only" encoding for wide-character fields.

  2. When converting from wide-character "DBCS-only" to the user's alphanumeric "mixed-DBCS" encoding, the encoding difference is ignored.

For example, if the user encoding for both alpha and wide formats is defined as "DBCS" and in the FDT, field AA is defined as alpha and field WW is defined as wide:

Format Buffer Value in User Buffer
AA[,A] mixed-DBCS
AA,W DBCS-only
WW,A DBCS-only
WW[,W] DBCS-only

Top of page