Enabling Unicode and Code Page Support

This document covers the following topics:

International Components for Unicode for Software AG (ICS)
ICS Module SAGICU
Alternative ICS Modules for Support of Architecture Levels
Load Modules with Minimum Collation Data
ICU Data Libraries
ICU Data Items
Unicode and Code Page Support for Adabas
Translation Tables
Support of Multi-Byte Code Pages

International Components for Unicode for Software AG (ICS)

Code page conversion and Unicode support make use of functionality provided by International Components for Unicode for Software AG (ICS). If you want to enable Natural for Unicode and code page support, you have to install the components provided with ICS: the ICS module SAGICU or an alternative ICS module (z/VSE and z/OS only) and ICU data libraries.

Notes:

No ICS component must be installed to execute applications without Unicode and code page support, that is, when the profile parameters CFICU and CP are set to OFF.
Information on the currently used ICU version and Unicode specification is provided in the main menu of the SYSCP utility. See Invoking and Terminating SYSCP in the Utilities documentation of the Natural for Mainframes documentation.

ICS Module SAGICU

If you want to enable Natural for Unicode and code page support, you need to link and load an ICU data library during the installation of Natural as described in Installing International Components for Unicode for Software AG for z/OS (see ICS Transition Version 222 and ICS 311), z/VSE and BS2000.

The ICS module SAGICU is intended to be used independently from localization data. It contains no statically-linked code pages and locales. A dataset containing the entirety of the ICU localization data, modulated in data items, is part of the ICS 311 delivery. Its name can be specified by the CFICU STEPLIB parameter or statically in the JCL as a Natural steplib.

Statically-linked collation data (set of code pages and locale IDs) is still supported and is part of the ICS Transition Version 222.

Collation Services
Code Pages and Locales

Collation Services

Another feature of this module is collation services. Collation services are used to compare Unicode strings. They consider the fact that the alphabetical order varies from language to language. It is a big challenge to accommodate the world's languages and writing systems and the different orders that are used. However, the ICU collation service provides excellent means for comparing strings in a locale-sensitive fashion. For example, in German locale, the character "Ä" is sorted between "A" and "B"; in Swedish locale, it is sorted after "Z". In Lithuanian, the character "y" is sorted between "i" and "k". The ICU implementation of collation services is compliant to the Unicode Collation Algorithm and conforms to ISO 14651. The algorithms have been designed and reviewed by experts in multi-lingual collation, and are therefore robust and comprehensive.

Code Pages and Locales

Statically-linked collation data (set of code pages and locale IDs) is not supported with ICS 311. It is still supported and is part of the ICS Transition Version 222 (see ICS Transition Version 222).

ICS 311 uses all of the ICU localization data.

The ICS module SAGICU provides the following code pages and locales:

Code Pages	Locales
IBM037 IBM273 IBM1025 IBM1026 IBM1047 IBM1097 IBM01140 IBM01141 IBM01145 IBM01146 IBM01147 US (alias for IBM01140) DE (alias for IBM01141) ES (alias for IBM01145) EN (alias for IBM01146) FR (alias for IBM01147) IBM-37_P100-1995,SWAPLFNL IBM-1047_P100-1995,SWAPLFNL IBM-1140_P100-1997,SWAPLFNL EBCDIC-XML-US EDF03DRV (BS2000 code page) EDF03IRV (BS2000 code page) EDF04DRV (BS2000 code page) EDF04IRV (BS2000 code page) EDF041 (BS2000 code page) EDF04F (BS2000 code page) IBM-290 (Japanese code page SBCS) IBM-930 (Japanese code page SBCS/DBCS) IBM-939 (Japanese code page SBCS/DBCS) IBM-1390 (Japanese code page SBCS/DBCS) IBM-1399 (Japanese code page SBCS/DBCS) IBM-932 (Japanese code page ASCII MBCS) IBM-942 (Japanese code page ASCII MBCS) IBM-943 (Japanese code page ASCII MBCS) EUC-JP (Japanese code page ASCII MBCS) IBM-420 (RTL code page) IBM-424 (RTL code page) IBM-916 (RTL code page)	de_DE en_US es_ES fr_FR sv_SE

Code Pages

Locales

IBM037
IBM273
IBM1025
IBM1026
IBM1047
IBM1097
IBM01140
IBM01141
IBM01145
IBM01146
IBM01147
US (alias for IBM01140)
DE (alias for IBM01141)
ES (alias for IBM01145)
EN (alias for IBM01146)
FR (alias for IBM01147)
IBM-37_P100-1995,SWAPLFNL
IBM-1047_P100-1995,SWAPLFNL
IBM-1140_P100-1997,SWAPLFNL
EBCDIC-XML-US
EDF03DRV (BS2000 code page)
EDF03IRV (BS2000 code page)
EDF04DRV (BS2000 code page)
EDF04IRV (BS2000 code page)
EDF041 (BS2000 code page)
EDF04F (BS2000 code page)
IBM-290 (Japanese code page SBCS)
IBM-930 (Japanese code page SBCS/DBCS)
IBM-939 (Japanese code page SBCS/DBCS)
IBM-1390 (Japanese code page SBCS/DBCS)
IBM-1399 (Japanese code page SBCS/DBCS)
IBM-932 (Japanese code page ASCII MBCS)
IBM-942 (Japanese code page ASCII MBCS)
IBM-943 (Japanese code page ASCII MBCS)
EUC-JP (Japanese code page ASCII MBCS)
IBM-420 (RTL code page)
IBM-424 (RTL code page)
IBM-916 (RTL code page)

de_DE
en_US
es_ES
fr_FR
sv_SE

Alternative ICS Modules for Support of Architecture Levels

This section does not apply to BS2000.

If your Natural system runs on z/OS or z/VSE with an IBM processor with architecture level 9 or higher, you can replace the ICS module SAGICU by SAGICUA9. SAGICUA9 is built to use advanced machine instructions introduced with IBM's ESA/390 and z/Architecture. You can use the system command TECH (see the System Commands documentation) to find out the architecture level supported on your current machine.

SAGICUA9 improves the execution performance, especially for Natural statements that use Unicode variables or code-page encoding instructions (for example, MOVE ENCODED). For more information on architecture levels, refer to the related documentation from IBM (z/Architecture, Principles of Operation).

Warning:
An operation exception error (abend code S0C1) can occur if the ICS module SAGICUA9 is used, but the underlying machine architecture level is lower than 9.

Load Modules with Minimum Collation Data

These modules are not delivered with ICS 311, as the load modules of ICS 311 (SAGICU and SAGICUA9) are already minimal in size and contain no statically-linked ICU localization data.

If ICS Version 222 is installed at your site, you can use the load modules SAGICUM and SAGICUM9 to include only the bare minimum of collation data in the module build. This enables a light-weight configuration and better performance for particular use cases.

ICU Data Libraries

Data libraries provided by Software AG are not supported with ICS 311. They are still supported and are part of the ICS Transition Version 222 (see ICS Transition Version 222).

ICS 311 uses all of the ICU localization data.

ICU data libraries are supplied with the following ICS data modules where nn denotes the current version of the module as announced in the current Natural Release Notes for Mainframes.

Data Module	Description
`ICSDTnnE`	Contains the most popular code pages and locales. The code pages are already declared in `NATCONFG`.
`ICSDTnnJ`	Same as `ICSDTnnE`, but enhanced by Japanese code pages. `ICSDTnnJ` is already linked to the ICS module `SAGICU` (or an alternative ICS module on z/VSE or z/OS). It contains the above mentioned code pages and locales.
`ICSDTnnX`	Contains all possible converters and locales offered by the currently supported ICU version. It supports about 230 different code pages (predominantly EBCDIC code pages) and 238 locales. Therefore, the module size is huge. `ICSDTnnX` supports all code pages and locale IDs which are supported by the currently supported ICU version (see http://demo.icu-project.org/icu-bin/convexp). Note: Due to technical restrictions, `ICSDTnnX` is not delivered for z/VSE and BS2000.

It is possible to create your own ICU data library that exactly matches your requirements (see Customizing the ICU Data Library).

ICU Data Items

The ICU data items supported by Natural include converters and collators. For example: a converter is used when a MOVE ENCODED statement executes, and a collator when strings are compared in an IF statement.

An ICU data item is either statically linked to an ICU data library or it is dynamically loaded on request during the Natural session.

ICU data items are supplied as loadable modules on the ICS data set supplied for installation of Natural, and must be accessible through the Natural steplib chain.

When a data item is used for the first time, ICS attempts to open it from the linked or loaded ICU data library. If no data item is associated with a library, ICS attempts to dynamically load the data item from the ICS data set.

This section covers the following topics:

Naming Conventions for Data Item Modules
ICU Dynamically Loaded Single Data Items

Naming Conventions for Data Item Modules

The name of a data item module in the ICS data set is restricted to eight characters. As indicated in the table below, it consists of the following:

A prefix (I),
A two-digit ICU version (xx),
A logical group identifier (C, B, S, L, M or D), and
A four-digit sequence number (nnnn).

Module Name	Contents
`IxxCnnnn`	Charset mapping tables (converter modules)
`IxxBnnnn`	Break iterators
`IxxSnnnn`	Collators (collation services)
`IxxLnnnn`	Localization (formatting, display names and other localized data)
`IxxMnnnn`	Miscellaneous data (rule-based number formats and transliterators)
`IxxDnnnn`	Base data

Example:

I58C0074 is the name of a converter for ICU Version 58.2 and code page ibm-1148_P100-1997.

However, in a MOVE ENCODED statement, Natural expects the long name of the code page that corresponds to the data item module. Any valid alias name of the code page can be used. The name of the code page is automatically mapped to the eight-character short name when the data item module is loaded.

For further information, see the appropriate ICU web site.

ICU Dynamically Loaded Single Data Items

Using dynamically loaded single data item modules allows for extensive flexibility. Data is loaded on demand and supports all code pages. A dataset containing all of the ICU localization data, modulated in single data items, is part of the ICS 311 delivery.

A single data item module is loaded when first accessed (e.g. by a MOVE ENCODED statement) and is available for future use instantly without the need to reload. Only the already used code pages will be kept in memory and no statically-linked data or a separate data library as was the case with previous ICS versions.

Single data item modules are especially useful for z/VSE and B2000, as they do not support the extended data library functionality (which was available with previous ICS versions).

Unicode and Code Page Support for Adabas

If a Natural session is enabled for code page or Unicode support, you should ascertain that Natural's Adabas user session also uses the appropriate user encoding for accessing Adabas data.

Because Adabas uses Entire Conversion Services (ECS) for conversion, the ECS name must be specified in the related NTCPAGE entry in module NATCONFG. To ascertain that Natural's Adabas user session uses the correct code page, specify the ACODE and/or WCODE option in the OPRB parameter for the databases used.

For more information on Adabas Unicode and code page support conversion, see the Adabas documentation for mainframes.

Translation Tables

Natural uses various tables for character translation and character property definition. The contents of the tables can be modified via profile parameters (TAB, UTAB1, UTAB2 and SCTAB) during the start of a Natural session.

If Natural is running with code page support (that is: the CP profile parameter is set to a value other than OFF), the tables cannot be modified by the user. In this case, the following Natural startup message will be issued to notify the user that the above mentioned session parameters are not considered:

Character translation parameter table-name ignored due to CFICU=ON.

Natural adjusts the tables automatically, according to the code page used for the Natural session (value of the system variable *CODEPAGE). See also Translation Tables in the Operations documentation.

Support of Multi-Byte Code Pages

Natural supports multi-byte code pages (MBCS) such as IBM-939 which is a Japanese code page based on EBCDIC and DBCS. Multi-byte code pages can be selected using the CP parameter (by setting CP to AUTO (if supported) or to the name of a code page). If Natural is running with a multi-byte code page, it uses internal I/O buffers which are based on Unicode. This means that all data written into the internal I/O buffers by an I/O statement are converted to Unicode. Due to the requirements of Unicode and multi-byte code pages, the size of the I/O buffers is increased as compared to the traditional I/O since Unicode characters need twice as much space as EBCDIC characters and enhanced attributes are needed to describe a field.

In the case of single-byte code pages (SBCS) such as IBM-1140, the traditional EBCDIC-based I/O is still used to preserve resources.