Version 9.5 SP1
 —  Internationalization with EntireX  —

Introduction to Internationalization

This document provides an introduction to the topic of internationalization with EntireX and describes the various approaches offered. It covers the following topics:

See also What is the Best Internationalization Approach to use?


Overview

The translation and conversion of codepages is a symmetric process. Everything that is valid for the request (client to server) relates also to the reply (server to client), with opposite roles. Therefore the terms sender and receiver are used instead of client and server in this section.

Internationalization with EntireX provides the following:

The following sections discuss all of the internationalization approaches offered by EntireX.

Top of page

ICU Conversion

Introduction

ICU conversion is based on IBM's project International Components for Unicode. It is a mature, widely used set of C/C++ and Java libraries for Unicode support, software internationalization and globalization.

ICU comes with a set of ICU converters (codepages) based on codepages from ISO and software vendors such as Microsoft and IBM. It is a standardized approach, and it is possible to extend the set with ICU custom converters.

Using ICU Conversion

You can use ICU conversion in the following situations:

If you require special codepages that are not delivered, you can install user-written ICU Custom Converters.

Requirements for ICU Conversion

For ICU conversion to function correctly, the following requirements must be met:

ICU's Conversion Technique

ICU uses algorithmic conversion, non-algorithmic conversion and combinations of both. With non-algorithmic conversion, tables are provided that contain a mapping of codepage characters to Unicode as a definition of a codepage. This format is also called UCM Format.

ICU conversion is a two-step process:

  1. The conversion table designated by the sender is used to convert from characters of the source codepage to Unicode.

  2. The conversion table designated by the receiver in the reverse direction is used to convert from Unicode to characters of the target codepage.

ICU uses line-oriented text files to define non-algorithmic converters. For complex codepages, partially or fully algorithmic converters may be used, which cannot be defined as simple text files.

Top of page

ICU Resources

Please refer to "License Texts, Copyright Notices and Disclaimers of Third Party Products". This document is part of the product documentation, located at http://documentation.softwareag.com/legal/.

This section covers the following topics:

ICU Homepage

The ICU home page (http://www.icu-project.org/) is the main point of entry for information on International Components for Unicode (ICU).

ICU Converter Explorer

The ICU Converter Explorer available at http://demo.icu-project.org/icu-bin/convexp shows aliases and more information on ICU converters. An ICU converter is the codepage definition used by ICU. The ICU converter is defined by a so-called UCM format. If the location has changed since this documentation was published, perform an internet search for the ICU home page and follow the links to the ICU Converter Explorer.

The mapping of aliases to ICU converters is also provided as a text source within an EntireX installation. The location depends on the operating system:

ICU Converter Resources

EntireX includes a standard set of the most commonly used ICU converters (codepages) in binary format packed into shared libraries.

ICU Custom Converters

If the provided standard ICU converters (codepages) do not match your requirements, the ICU codepages can be extended by user-written ICU custom converters. This is done with the ICU tool makeconv delivered with EntireX. With makeconv, ICU converter files in UCM Format are compiled into a binary format with extension cnv. The binary format cnv depends on the endianness (big/little endian) and charset family (ASCII/EBCDIC) where makeconv is executed. See Building and Installing ICU Custom Converters under z/OS | UNIX | Windows | BS2000/OSD.

UCM Format

The codepage definition text files for ICU are described in UCM format (extension ".ucm"). You can edit them with any text editor. The most important section is the mapping table between the CHARMAP and END CHARMAP lines. Each line contains a Unicode code point and the related codepage character byte sequence followed by an optional precision indicator. Four kinds of definitions are supported by the precision indicator:

This brief explanation does not intend to describe the UCM file format fully. For further explanation of the UCM file format, see the ICU home page under ICU Resources above.

Top of page

Translation

Introduction

Translation is the quick-start approach with little configuration required, only service-specific or topic-specific broker attribute TRANSLATION in the broker attribute file has to be set to the value SAGTCHA. Nothing needs to be configured or considered for the EntireX component (sender or receiver). Translation does not need locale strings. If translation is specified and an EntireX component sends a locale string, the locale string will be ignored.

Translation has limitations on the number of environments supported and the number of different codepages for the environment in which your EntireX components (sender or receiver) are running:

Translation Codepages

Translation has further limitations on the code points used within the codepages provided. The translation routine SAGTCHA is loosely based on the following platform-dependent codepages:

Environment Indicator sent from EntireX Component to Broker Based on Codepage Description
All ASCII environments (UNIX, Windows etc.) x'80' Microsoft Windows codepage 1252 Translation of characters for ASCII environments is loosely based on Windows codepage 1252. Not all of the characters of Windows codepage 1252 are supported by translation. All of the characters supported have the same code point in codepage ISO 8859-1, thus this is also suitable for UNIX.
IBM mainframe x'22' IBM codepage 273 Translation of characters for the IBM mainframe platform is loosely based on IBM codepage 273. Not all of the characters of the IBM codepage 273 are supported by translation.
Fujitsu mainframe x'42' EDF 03 national version for Germany Translation of characters is loosely based on the EDF03 codepage for Germany.

Characters (code points) supported by SAGTCHA are the same as in the Translation user exit example. See Writing Translation User Exits under z/OS | UNIX | Windows | BS2000/OSD.

Using Translation

You can use translation in the following situations:

See Configuring Translation under z/OS | UNIX | Windows | BS2000/OSD.

Top of page

Translation User Exit

Introduction

With translation user exits, the code points of the codepage used are under your control. You can adapt them to meet your requirements. This requires programming a user-specific translation routine. See Writing Translation User Exits under z/OS | UNIX | Windows | BS2000/OSD. The delivered model for the translation user exit supports single-byte codepages only, but in principle any type of codepage can be implemented.

With translation user exits, you can make any structure of the data (mixture of text and binary data) within your payload known to the user exit by means of the ACI field ENVIRONMENT, which can be shared between your application and the translation user exit. For more information, see Using the ENVIRONMENT Field with the Translation User Exit for client and server | publish and subscribe.

Configuration effort is easy, only service-specific or topic-specific broker attribute TRANSLATION in the broker attribute file has to be set to the name of your user exit. Nothing needs to be configured or considered for the EntireX component (sender or receiver). Translation does not need locale strings. If a translation user exit is specified and an EntireX component sends a locale string, the locale string will be ignored.

The limitations on the number of environments and different codepages per environment remain the same as for translation.

Using Translation User Exit

You can use a translation user exit in the following situations:

Top of page

Translation User Exit Replacement with ICU Conversion

If a Translation User Exit is used to adapt code points only, that is, to implement a standard ASCII/EBCDIC codepage, the same functionality can be achieved with ICU conversion, simply by using Broker's Locale String Defaults, well configured, and service-specific or topic-specific broker attribute CONVERSION OPTION=SUBSTITUTE set for the same error behavior as translation. See OPTION Values for Conversion.

Example

For an environment running in Spain using clients with the Windows 1252 codepage and servers on IBM mainframe with codepage 1145, set the following Codepage-specific Attributes:

DEFAULTS=CODEPAGE
            /* Broker Locale String defaults */
            DEFAULT_ASCII=windows-1252
            DEFAULT_EBCDIC_IBM=ibm-1145

For ACI-based Programming, set the service-specific or topic-specific broker attribute CONVERSION:

DEFAULTS=SERVICE
            . . . 
            CONVERSION=(SAGTCHA,OPTION=SUBSTITUTE)
            . . .

For RPC-based Components and Reliable RPC, set the service-specific or topic-specific broker attribute CONVERSION

DEFAULTS=SERVICE
            . . . 
            CONVERSION=(SAGTRPC,OPTION=SUBSTITUTE)
            . . .

For more examples see Configuring Broker's Locale String Defaults.

Top of page

SAGTRPC User Exit

Introduction

With the SAGTRPC user exit you can invent your own conversion package/method for RPC-based Components and Reliable RPC if for any reason a codepage is not supported by ICU Conversion and SAGTRPC conversion. SAGTRPC user exit cannot be used for ACI-based Programming.

SAGTRPC user exit allows you to adapt codepages and their characters (code points) to meet your requirements. This requires some effort in programming a SAGTRPC user exit. See Writing SAGTRPC User Exits under z/OS | UNIX | Windows | BS2000/OSD. The delivered model for the SAGTRPC user exit supports single-byte codepages only, but in principle any type of codepage can be implemented.

Using SAGTRPC User Exit

You can use SAGTRPC user exit in the following situations:

Requirements for SAGTRPC User Exit

For SAGTRPC user exit to function correctly, the following requirements must be met:

Top of page

Arabic Shaping

Arabic shaping is part of ICU Conversion and is available between UTF-8, the Arabic ASCII codepage windows-1256 and the Arabic EBCDIC codepage IBM-420 for all of the communication models EntireX Broker offers, for example:

Shaping is performed only on the codepages listed above. See also Conversion with Multibyte, Double-Byte and other Complex Codepages.

Top of page