This document provides an introduction to the topic of internationalization with EntireX and describes the various approaches offered. It covers the following topics:
See also: Broker Attribute File
The translation and conversion of codepages is a symmetric process. Everything that is valid for the request (client to server) relates also to the reply (server to client), with opposite roles. Therefore the terms sender and receiver are used instead of client and server in this section.
Internationalization with EntireX provides the following:
Codepage conversion is available for senders and receivers, so any participant is able to work with the desired codepage. A participant tells the Broker the codepage they use to send and receive messages. So the Broker is able to perform a conversion from/to the desired characters (code points) within the codepages.
Basically codepage conversion deals with the user's payload data in broker's send and receive buffers. The Broker ACI control block is handled differently and does not require special attention concerning internationalization.
For the simpler approaches Translation and Translation User Exits, participants give the codepage to the broker implicitly - nothing needs to be configured for EntireX Components (senders and receivers).
For the more accurate approaches of ICU Conversion and SAGTRPC User Exit, the codepage that an EntireX Component (sender and receiver) uses is described by so-called locale strings (alias name of a codepage) sent along with the request to the broker. The locale string always requires some attention. Depending on the platform your EntireX component is running on, the locale string is sent automatically by default or has to be provided.
As long as you use your platform's default codepage and the locale string is provided automatically, nothing else needs to be considered.
If the locale string is not provided automatically, providing one can be a programming issue or a configuration issue, depending on the EntireX component used.
For information on how to provide locale strings, or whether locale strings are sent automatically, see table Preparing EntireX Components for Internationalization under z/OS | UNIX | Windows.
Codepage conversion is available for all of the communication models the broker offers, such as
ACI-based programming in its various language bindings (Java, C, Assembler, Natural, etc.), see Internationalization for ACI-based Programming.
RPC-based components such as DCOM Wrapper, Java Wrapper, Java Wrapper, COBOL Wrapper, .NET Wrapper, etc. See Internationalization for RPC-based Components
Publish and Subscribe
etc.
The following sections discuss all of the internationalization approaches offered by EntireX.
ICU Conversion is based on IBM's project International Components for Unicode. It is a mature, widely used set of C/C++ and Java libraries for Unicode support, software internationalization and globalization.
ICU comes along with a set of codepages based on codepages from ISO and software vendors such as Microsoft and IBM, and is a standardized approach. For a list of codepages delivered with EntireX, see column Installed with EntireX in the Table of ICU to ECS Compatibility.
if you need multiple codepages per environment, e.g. more than one unique ASCII, IBM mainframe or Siemens mainframe codepage.
if you need standardized codepages and the ICU converters provided meet your requirements.
if you need double-byte or multibyte conversions.
The broker must be configured for the platform the broker is running on. See Configure ICU Conversion under z/OS | UNIX | Windows.
ICU Conversion requires configuration of the Broker, the
service-specific
or topic-specific
parameter CONVERSION in the Broker attribute file must
be set:
for ACI-based programming use SAGTCHA for any type of codepage i.e. single-byte, double-byte and multibyte encoding schemes. See Internationalization for ACI-based Programming.
for RPC-based components use SAGTCHA for single-byte codepages also. SAGTRPC must only be used for double-byte and multibyte encoding schemes. See Internationalization for RPC-based Components.
EntireX Components (sender and receiver) must send a locale string to the broker. Depending on the platform your EntireX Component is running on, this is done automatically by default - nothing else needs to be configured as long as you use your platform's default codepage. If the locale string is not provided automatically, it can be set as a programming issue or a configuration issue, depending on the EntireX component used. See Preparing EntireX Components for Internationalization under z/OS | UNIX | Windows.
ICU uses algorithmic conversion, non-algorithmic conversion and combinations of both. With non-algorithmic conversion, tables are provided that contain a mapping of codepage characters to Unicode as a definition of a codepage. This format is also called ucm format.
ICU conversion is a 2-step process:
The conversion table designated by the sender is used in the first step to convert from characters of the source codepage to Unicode.
The conversion table designated by the receiver in the reverse direction is used in the second step to convert from Unicode to characters of the target codepage.
ICU uses line-oriented text files to define non-algorithmic converters. For complex codepages partially and fully algorithmic converters may be used which cannot be defined as simple text files.
The codepage definition text files for ICU are also called ucm files because of the extension ucm. The most important section is the mapping table between the CHARMAP and END CHARMAP lines. Basically each line contains a Unicode code point and the related codepage character byte sequence followed by an optional precision indicator. Four kinds of definitions are supported by the precision indicator:
0 - normal roundtrip mapping from a Unicode code point and back.
1 - fallback mappings are used during conversion from Unicode to the codepage, but not back again. This definition may be present if a character exists in Unicode but not in the codepage. This feature is useful for human-readable output where the missing character is mapped to a similar looking one.
2 - substitution mappings resulting in assignment of the alternative substitution sequence (subchar1 in ucm format) when a nonconvertible character occurs, instead of assigning the default substitution sequence (subchar in ucm format).
3 - reverse fallback mappings are used during conversion from the codepage to Unicode, but not back again. This definition results in assigning the same Unicode code point for different codepage character byte sequences.
This brief explanation does not intend to describe the ucm file format fully. For further explanation of the ucm file format, refer to the ICU home page (see ICU Resources, below).
Please read the notice in Copyrights and Disclaimers of Included Third Party Products.
The ICU home page (http://www.icu-project.org/) is the main point of entry for information on International Components for Unicode (ICU).
The ICU Converter Explorer available at http://demo.icu-project.org/icu-bin/convexp shows aliases and further information on ICU converters. An ICU converter is the codepage definition used by ICU. The ICU converter is defined by a so-called ucm format. If the location has changed since this documentation was published, perform an internet search for the ICU home page and follow the links to the ICU Converter Explorer.
The mapping of aliases to ICU converters is also provided as a text source within an EntireX installation. The location depends on the operating system.
UNIX: exxdir/exxvers /etc/convrtrs.txt
Windows: ..\Program Files\SoftwareAG\EntireX\Etc\convrtrs.txt
EntireX delivers a standard set of the most commonly used ICU converters. For a list of codepages delivered with EntireX, see column "Installed with EntireX" in the Table of ICU to ECS Compatibility.
Translation is the quick-start approach with little configuration
required, only the
service-specific
or topic-specific
parameter TRANSLATION in the Broker attribute file has
to be set to the value SAGTCHA. Nothing needs to be configured or considered
for the EntireX component (sender or receiver). Translation does not need
locale strings. If translation is specified and an EntireX component sends a
locale string, the locale string will be ignored.
Translation has limitations on the number of environments supported and the number of different codepages for the environment in which your EntireX components (sender or receiver) are running:
All ASCII environments (Windows, UNIX, etc.) must use the same ASCII codepage.
All IBM mainframes must use the same EBCDIC codepage.
All Siemens mainframes must use the same EBCDIC codepage.
Translation has further limitations on the code points used within the codepages provided. The translation routine SAGTCHA is loosely based on the following platform-dependent codepages:
| Environment | Indicator sent from EntireX Component to Broker | Based on Codepage | Description |
|---|---|---|---|
| All ASCII environments, i.e. Windows, UNIX etc. | x'80' | Microsoft Windows codepage 1252 | Translation of characters for ASCII environments is loosely based on Windows codepage 1252. Not all of the characters of Windows codepage 1252 are supported by translation. All of the characters supported have the same code point in codepage ISO 8859-1, thus this is also suitable for UNIX. |
| IBM mainframe | x'22' | IBM codepage 273 | Translation of characters for the IBM mainframe platform is loosely based on IBM codepage 273. Not all of the characters of the IBM codepage 273 are supported by translation. |
| Siemens mainframe | x'42' | EDF 03 national version for Germany | Translation of characters is loosely based on the EDF03 codepage for Germany. |
Characters (code points) supported by SAGTCHA are the same as in the Translation User Exit example (under z/OS | UNIX | Windows). Refer to this example for the code points used.
if you have a mixed environment consisting of ASCII, IBM mainframes and/or Siemens mainframe platforms but
all of your ASCII environments use the same codepage.
all of your IBM mainframes use the same codepage.
all of your Siemens mainframes use the same codepage.
if single-byte codepages meet your requirements.
if the code points within the delivered codepages meet your requirements. Please note that not all codepoints implemented by SAGTCHA are round-trip compatible even if in your environment the Microsoft Windows codepage 1252, IBM codepage 273 and EDF 03 national version for Germany are used. Roundtrip incompatibility means that if you transfer a character from an ASCII platform to IBM EBCDIC or Siemens EBCDIC and back again you will get a different character. Important codepoints (characters) such as uppercase letters A - Z, lowercase letters a - z, digits 0 - 9 and the required codepoints for RPC data stream conversion and also others are roundtrip compatible.
for RPC-based components as well as for ACI-based programming and other communication models.
For information on how to configure the broker for translation, see the documentation for the platform under which the broker is running: z/OS | UNIX | Windows.
With Translation User Exits the code points of the codepage used are under your control. You can adapt them to meet your requirements. This requires programming a user-specific translation routine (z/OS | UNIX | Windows). The delivered model for the Translation User Exit supports single-byte codepages only - but in principle any type of codepage can be implemented.
With Translation User Exit you can make any structure of the data (mixture of text and binary data) within your payload known to the Translation User Exit. For this purpose the EntireX Broker ACI provides the field ENVIRONMENT which can be shared between your application and the Translation User Exit. See Using the Environment Field with the Translation User Exit.
Configuration effort is easy, only the
service-specific
or topic-specific
parameter TRANSLATION in the Broker attribute file has
to be set to the name of your User Exit. Nothing needs to be configured or
considered for the EntireX Component (sender or receiver). Translation does not
need locale strings. If a Translation User Exit is specified and an EntireX
Component sends a locale string, the locale string will be ignored.
The limitations on the number of environments and different codepages per environment remain the same as for Translation.
if you want to adapt code points to meet your requirements,
if you have to consider any payload data structure for translation,
if you have a mixed environment consisting of ASCII, IBM mainframes and/or Siemens mainframe platforms but
all of your ASCII environments use the same codepage,
all of your IBM mainframes use the same codepage,
all of your Siemens mainframes use the same codepage.
if single-byte codepages meet your requirements. Otherwise you will have to invent a model for other types of codepages - this can become very complicated and involve considerable effort.
for RPC-based components as well as for ACI-based programming and other communication models. For RPC-based components the codepage you implement must take into consideration the Requirements for RPC Data Stream Conversions for Codepage and Codepoint.
With the SAGTRPC User Exit it is possible to invent your own conversion package/method in EntireX and do without ICU for RPC-based components. SAGTRPC User Exit cannot be used for ACI-based programming.
SAGTRPC User Exit allows you to adapt codepages and their characters (code points) to meet your requirements. This requires some effort in programming a SAGTRPC user exit (see the section Writing SAGTRPC User Exits under z/OS | UNIX | Windows). The delivered model for the SAGTRPC User Exit supports single-byte codepages only - but in principle any type of codepage can be implemented.
if you want to adapt code points to meet your requirements.
if you need multiple codepages per environment, e.g. more than one unique ASCII, IBM mainframe or Siemens mainframe codepage. Although the delivered model supports a unique codepage for ASCII, IBM mainframe or Siemens only, it is not especially complicated to implement but does require a bit of "busy-work".
if single-byte codepages meet your requirements. Otherwise you will have to invent a model for other types of codepages - this can become very complicated and involve considerable effort.
for RPC-based components only. It cannot be used for other ACI-based programming.
if the codepages you implement take into consideration the codepoint requirements for RPC data stream conversions.
The broker must be configured for the platform it is running on. See Configure SAGTRPC User Exits under z/OS | UNIX | Windows.
SAGTRPC user exit requires configuration of the Broker; the
service-specific
or topic-specific
parameter CONVERSION in the Broker attribute file must
be set to the name of your routine.
Locale strings may be provided. It depends on your implementation of the SAGTRPC User Exit whether the components (sender and receiver) must send a locale string to the broker or not. See Preparing EntireX Components for Internationalization under z/OS | UNIX | Windows for information on how to provide locale strings.