Version 8.0
 —  Internationalization with EntireX  —

Introduction to Internationalization

This document provides an introduction to the topic of internationalization with EntireX and describes the various approaches offered. It covers the following topics:

See also: Broker Attribute File


Overview

The translation and conversion of codepages is a symmetric process. Everything that is valid for the request (client to server) relates also to the reply (server to client), with opposite roles. Therefore the terms sender and receiver are used instead of client and server in this section.

Internationalization with EntireX provides the following:

The following sections discuss all of the internationalization approaches offered by EntireX.

Top of page

ICU Conversion

ICU Conversion is based on IBM's project International Components for Unicode. It is a mature, widely used set of C/C++ and Java libraries for Unicode support, software internationalization and globalization.

ICU comes along with a set of codepages based on codepages from ISO and software vendors such as Microsoft and IBM, and is a standardized approach. For a list of codepages delivered with EntireX, see column Installed with EntireX in the Table of ICU to ECS Compatibility.

You can use ICU Conversion

For ICU Conversion to function correctly,

ICU's technique of conversion

ICU uses algorithmic conversion, non-algorithmic conversion and combinations of both. With non-algorithmic conversion, tables are provided that contain a mapping of codepage characters to Unicode as a definition of a codepage. This format is also called ucm format.

ICU conversion is a 2-step process:

ICU uses line-oriented text files to define non-algorithmic converters. For complex codepages partially and fully algorithmic converters may be used which cannot be defined as simple text files.

ucm format

The codepage definition text files for ICU are also called ucm files because of the extension ucm. The most important section is the mapping table between the CHARMAP and END CHARMAP lines. Basically each line contains a Unicode code point and the related codepage character byte sequence followed by an optional precision indicator. Four kinds of definitions are supported by the precision indicator:

This brief explanation does not intend to describe the ucm file format fully. For further explanation of the ucm file format, refer to the ICU home page (see ICU Resources, below).

Top of page

ICU Resources

Please read the notice in Copyrights and Disclaimers of Included Third Party Products.

ICU Home Page

The ICU home page (http://www.icu-project.org/) is the main point of entry for information on International Components for Unicode (ICU).

ICU Converter Explorer

The ICU Converter Explorer available at http://demo.icu-project.org/icu-bin/convexp shows aliases and further information on ICU converters. An ICU converter is the codepage definition used by ICU. The ICU converter is defined by a so-called ucm format. If the location has changed since this documentation was published, perform an internet search for the ICU home page and follow the links to the ICU Converter Explorer.

The mapping of aliases to ICU converters is also provided as a text source within an EntireX installation. The location depends on the operating system.

ICU Converter Resources

EntireX delivers a standard set of the most commonly used ICU converters. For a list of codepages delivered with EntireX, see column "Installed with EntireX" in the Table of ICU to ECS Compatibility.

Top of page

Translation

Translation is the quick-start approach with little configuration required, only the service-specific or topic-specific parameter TRANSLATION in the Broker attribute file has to be set to the value SAGTCHA. Nothing needs to be configured or considered for the EntireX component (sender or receiver). Translation does not need locale strings. If translation is specified and an EntireX component sends a locale string, the locale string will be ignored.

Translation has limitations on the number of environments supported and the number of different codepages for the environment in which your EntireX components (sender or receiver) are running:

Translation Codepages

Translation has further limitations on the code points used within the codepages provided. The translation routine SAGTCHA is loosely based on the following platform-dependent codepages:

Environment Indicator sent from EntireX Component to Broker Based on Codepage Description
All ASCII environments, i.e. Windows, UNIX etc. x'80' Microsoft Windows codepage 1252 Translation of characters for ASCII environments is loosely based on Windows codepage 1252. Not all of the characters of Windows codepage 1252 are supported by translation. All of the characters supported have the same code point in codepage ISO 8859-1, thus this is also suitable for UNIX.
IBM mainframe x'22' IBM codepage 273 Translation of characters for the IBM mainframe platform is loosely based on IBM codepage 273. Not all of the characters of the IBM codepage 273 are supported by translation.
Siemens mainframe x'42' EDF 03 national version for Germany Translation of characters is loosely based on the EDF03 codepage for Germany.

Characters (code points) supported by SAGTCHA are the same as in the Translation User Exit example (under z/OS | UNIX | Windows). Refer to this example for the code points used.

You can use Translation

For information on how to configure the broker for translation, see the documentation for the platform under which the broker is running: z/OS | UNIX | Windows.

Top of page

Translation User Exit

With Translation User Exits the code points of the codepage used are under your control. You can adapt them to meet your requirements. This requires programming a user-specific translation routine (z/OS | UNIX | Windows). The delivered model for the Translation User Exit supports single-byte codepages only - but in principle any type of codepage can be implemented.

With Translation User Exit you can make any structure of the data (mixture of text and binary data) within your payload known to the Translation User Exit. For this purpose the EntireX Broker ACI provides the field ENVIRONMENT which can be shared between your application and the Translation User Exit. See Using the Environment Field with the Translation User Exit.

Configuration effort is easy, only the service-specific or topic-specific parameter TRANSLATION in the Broker attribute file has to be set to the name of your User Exit. Nothing needs to be configured or considered for the EntireX Component (sender or receiver). Translation does not need locale strings. If a Translation User Exit is specified and an EntireX Component sends a locale string, the locale string will be ignored.

The limitations on the number of environments and different codepages per environment remain the same as for Translation.

You can use Translation User Exits

Top of page

SAGTRPC User Exit

With the SAGTRPC User Exit it is possible to invent your own conversion package/method in EntireX and do without ICU for RPC-based components. SAGTRPC User Exit cannot be used for ACI-based programming.

SAGTRPC User Exit allows you to adapt codepages and their characters (code points) to meet your requirements. This requires some effort in programming a SAGTRPC user exit (see the section Writing SAGTRPC User Exits under z/OS | UNIX | Windows). The delivered model for the SAGTRPC User Exit supports single-byte codepages only - but in principle any type of codepage can be implemented.

You can use SAGTRPC User Exit

For SAGTRPC User Exit to function correctly,

Top of page