Configuration and Administration of the Unicode and Code Page Environment

This document covers the following topics:

Notation vr:

When used in this document, the notation vr represents the 2-digit ICU version number.


Profile Parameters and Macros

This section lists the profile parameters and macros which are used in conjunction with Unicode and code page support.

Unless otherwise noted, the profile parameters and macros mentioned in this section are explained in detail in the Parameter Reference.

Parameter or Macro Description
CFICU or NTCFICU macro Enables Unicode support for various Unicode settings.

See also CFICU Parameter and CFICU and CP: Session Modes

CMPO or CPAGE keyword subparameter of NTCMPO macro Generates code page-sensitive Natural programs.

See also CPAGE Compiler Option.

CP

Defines the default code page for Natural. This code page is used for the runtime and development environment if not superposed with a code page defined for a single object (for example, for a Natural source).

Only platform-suitable code pages can be used. This means, for example, that no ASCII code page can be defined for a mainframe platform. An initialization error message occurs if a wrong code page is used.

See also CFICU and CP: Session Modes.

CPCVERR

Specifies whether a conversion error that occurs when converting from Unicode to code page or from code page to Unicode or from one code page to another code page results in a Natural error or not.

This parameter is not regarded for the conversion of Natural sources when loading them into the source area or when cataloging them.

It is not regarded whether a Unicode field is converted into the code page before an I/O on a terminal emulation. In this case, the substitution character defined by ICU is replaced by the place holder character which is defined in NATCONFG.

CPOBJIN Specifies the code page in which the batch input file for data is encoded. This file is defined in the data set CMOBJIN.
CPPRINT Specifies the code page in which the batch output file shall be encoded. This file is defined in the data set CMPRINT.
CPSYNIN Specifies the code page in which the batch input file for commands is encoded. This file is defined in the data set CMSYNIN.
NTCPAGE macro In the NATCONFG module, this macro defines a code page and all related information, such as place holder character, locale ID and collation tables.

See also NTCPAGE Macro.

NTCPAGE and NATCONFG are explained in detail in the Operations documentation.

OPRB or NTOPRB macro Sets the ACODE and/or WCODE option to define the user encoding if the used Adabas database is enabled for UES (universal encoding support).
PRINT or CP keyword subparameter of NTPRINT macro Defines the code page for a report.
SRETAIN Specifies that all existing sources have to be saved in their original encoding format. See also Customizing Your Environment.

See also:

This section covers the following topics:

CFICU Parameter

The parameter CFICU and its subparameters are explained in detail in the Parameter Reference. Some of the subparameters have an impact on the performance.

If collation services are used to compare Unicode strings, both strings are checked whether they are normalized or not. The check itself consumes a lot of CPU time. If you are sure that the strings are already normalized, you can switch off the check (COLNORM=OFF).

In Unicode, it is possible to represent the same character as one code point or as a combination of two or more code points. For example, the German character "ä" can be represented by "U+00E4" or by the combination of the code points "U+0061" and "U+0308". The conversion from Unicode to, for example, IBM01140 treats combined characters as single code points and produces an "a" followed by a substitution character since code point "U+0308" is not represented in the target code page. With CNVNORM=ON, a normalization is performed right before the actual conversion. The normalization consumes additional CPU time and temporary storage. If you are sure that no combining characters are involved in MOVE statements (except MOVE NORMALIZED), you should set CNVNORM to OFF to increase performance. Note that all possible combinations are represented by a single coded Unicode code point.

Conversion from Unicode to code page and vice versa is not high-performance. The reason is that the ICU implementation is written in C++ and that it covers nearly all Unicode, code page and language aspects in the world. However, some code pages can be mapped to Unicode (and vice versa) via translation tables to accelerate conversion. Accelerator tables are activated with the CPOPT subparameter. If it is set to ON, Natural automatically creates two accelerator tables during session initialization by using ICU conversion functions. The first table (with a size of 512 bytes) is used for conversion from code page to Unicode and the other table (with a size of 65535 bytes) is used for conversion from Unicode to code page. During a Natural session, all conversions are then executed via the accelerator tables instead of ICU calls. Accelerator tables are only provided for the default code page (*CODEPAGE). Temporary code pages (for example, in MOVE ENCODED statements) do not use accelerator tables if the module NATCPTAB is not linked. If it linked, up to 30 accelerator tables based on the ICU database are used to speed up performance.

CFICU and CP: Session Modes

The parameters CFICU and CP can be used to adjust Natural to specific purposes:

Settings Description
CFICU=OFF, CP=OFF Compatibility mode. For running existing applications without Unicode and without code page support. Legacy translation tables are used for I/O translation. Compared with former versions, there is no significant increase in resource consumption (CPU time and buffer usage). This mode does not need the ICS module SAGICU (or an alternative ICS module) to be linked to the Natural nucleus.
CFICU=ON, CP=OFF For new applications that are using Unicode and code page conversion (MOVE ENCODED) but not default code page support. Therefore, the system variable *CODEPAGE is empty. It is possible to use U format variables, but it is not possible to use, for example, MOVE A TO U, since this requires the default code page information. The error NAT3411 will be issued indicating that no default code page is available.
CFICU=ON, CP=value * For new applications that are using full Unicode as well as code page support.
CFICU=OFF, CP=value * This combination does not make sense, because code page support needs ICU services for conversion. Therefore, CFICU=ON is enforced in this case and a session initialization message is issued.

* where value is any value other than OFF.

CPAGE Compiler Option

The compiler option CPAGE creates objects that can be executed with a code page which is different from the code page used at creation time. This means that all alphanumeric constants of the object which are coded with the code page at creation time have to be converted to the code page which is active at execution time. To make it possible for the Natural object loader to find and convert alphanumeric constants, an additional table is created by the compiler. This increases the size of the generated object, depending on the number of used alphanumeric constants. The conversion at runtime consumes additional CPU time. If the default code page (value of the system variable *CODEPAGE) is the same as the code page at creation time or if the session has no default code page (CP=OFF), no conversion is done. Conversion errors are ignored, independent from the setting of the parameter CPCVERR. If the compiler option CPAGE is set to OFF, no conversion is performed at runtime and the alphanumeric constants are treated as they are.

The following sample program is cataloged with code page IBM01141 (German) and is executed with default code page IBM01140 (us). The characters "Ä", "Ö" and "Ü" are defined in both code pages, but at different code points.

Example 1 - CPAGE=OFF:

OPTIONS CPAGE=OFF
WRITE *CODEPAGE  'ÄÖÜ'
END

Output with code page IBM01140 (us):

Page      1                                                 
                                                                              
IBM01140                                                         ¢\!

Example 2 - CPAGE=ON:

OPTIONS CPAGE=ON
WRITE *CODEPAGE  'ÄÖÜ'
END

Output with code page IBM01140 (us):

Page      1                                                 

IBM01140                                                         ÄÖÜ

NTCPAGE Macro

The most common standard for code page names is the IANA name. Therefore, the system variable *CODEPAGE contains the IANA name of the default code page. A code page is qualified by its Coded Character Set ID (CCSID). Currently, Adabas uses the Entire Conversion Service definition (ADAECS). The macro NTCPAGE can be used to assign these different names to the unambiguous IANA name. NTCPAGE is part of the Natural configuration module (NATCONFG).

It does not matter whether the IANA name, the CCSID/CCSN or the alias name is entered with the CP parameter. The alias name can be a user-defined name which is used to assign a more significant name to the code page. In any case, *CODEPAGE contains the IANA name of the selected code page.

In addition, a place holder character can be defined for a code page. It overwrites the default substitution character of that code page, which is normally a non-displayable character (for example, H’3F’ in an EBCDIC code page). The place holder character can be used to avoid that non-displayable characters are sent to terminals.

Example:

NTCPAGE IANA=IBM01140,CCSID=1140,ECS=1140,ALIAS=’US’,PHC=003F

The values IBM01140, 1140 or US can be entered with the CP parameter to activate the code page. *CODEPAGE contains the name IBM01140. The substitution character of the code page will be replaced by "U+003F", which is a quotation mark (?).

The number of available code pages depends on the used ICU data library.

All code pages defined in the currently used data package can be used by Natural. An NTCPAGE entry is only necessary if an alternative alias name or place holder character is desired.

Natural Development Server

The following configuration parameter is available with Natural Development Server (NDV):

Settings Description
TERMINAL_EMULATION=WEBIO Specifies that the Natural Web I/O Interface client (which supports Unicode) is used for input and output.

Encoding Information

The code page information of the object is part of the object directory displayed with the LIST system command. For details, see Displaying Directory Information in the System Commands documentation.

The encoding of code page data can be specified on different levels.

Level 1 - Default Code Page

The default code page can be defined with the CP parameter.

Level 2 - Code Page for a Single Object

A code page can be defined for Natural sources, batch input (CPOBJIN, CPSYNIN) and output files (CPPRINT).

If a code page is defined at object level, this overwrites the default code page.