It is assumed that you have read the Introduction to Internationalization.
A locale provides a means of identifying a specific region for the purposes of internationalization and localization. EntireX sends properties of the operating system locale to the Broker, using the locale string.
This document describes the mapping of the locale string to codepages within the broker for character conversion with ICU conversion and SAGTRPC user exit. It does not apply to other approaches.
This document covers the following topics:
See also Broker Attributes.
Depending on the character conversion approach in effect for a service, the result of the locale string processing within the broker is either the ICU converter alias or the codepage number given to SAGTRPC user exit.
There are always two EntireX components involved - client or server, sender or receiver - so for genuine conversion this process is run through for the sender to determine its codepage as well as for the receiver to get the codepage. It is important to know both codepages in order to predict conversion behavior accurately.
If no locale string is provided by an EntireX component (sender or receiver), the Broker's Locale String Defaults will apply. These can be customized in the broker attribute file; see Configuring Broker's Locale String Defaults.
If a locale string is provided by an EntireX component (sender or receiver), the broker first refers to the codepage section of the attribute file searching for a keyword entry identical to the locale string sent. You can also bypass these entries to suit your needs; see Bypassing Broker's Built-in Locale String Mapping.
If an entry is found in step 2, this codepage is used directly and no further mapping occurs.
If no entry is found in step 2, the Broker's Built-in Locale String Mapping is entered.
A prerequisite for the broker for correct character conversion is a suitably configured environment. This includes the broker's platform and may include the platforms that other EntireX components (sender and receiver) run on.
The data that EntireX components send to the broker must be in the encoding described by the locale string.
The data that EntireX components send to the broker must be in the encoding of the codepage the broker uses for conversion or translation.
After broker's locale string processing (steps 1-4 above), the resulting ICU converter or the code points implemented with the Translation user exit or with the SAGTRPC user exit must match the defined code points of the original codepage of your application's environment. (Matching code points is not a trivial matter. Almost every hardware and software vendor provides its own codepage, based on standards organizations such as ISO etc. It is even more difficult to find matching codepages because there is no unique scheme of identifiers for codepages across all organizations.)
If one of the prerequisites above is not met, results will be unpredictable.
The table in this section describes the built-in mechanism the broker uses to map a locale string to a codepage.
The last two forms in the table below CP <number> and <codepage-name> are the forms administrators and programmers use to configure or provide a codepage manually. This is necessary if a locale string is not sent by default to the broker.
Depending on the format of the locale string sent by the EntireX component, various rules for mapping apply:
Locale String Sent to Broker | Locale String Format Sent | ICU Converter Alias | SAGTRPC User Exit Codepage Number |
---|---|---|---|
<language>_
<country>
.<number> or <language>- <country> .<number> where any abbreviation is accepted for <language> and <country> and <number> is the decimal number of a Windows codepage. |
|
The part
<number> is extracted from the locale string
and prefixed by the term windows. The result is used as the ICU converter
alias. The parts <language> and
<country> are not used. Example: german_Germany.1252 results in the alias
windows-1252 for the ICU converter alias.
|
The part <number> is extracted and used together with the table Mapping Codepage Numbers to evaluate the number given to the SAGTRPC user exit. The parts <language> and <country> are not used. |
<ll>_<
cc > where <ll> is the 2-letter language abbreviation and <cc> the 2 or 3-letter language code according to ISO 639 standard. |
|
The part <ll> is extracted and used together with the table Mapping Two-character Language Codes to map to an ICU converter alias. The part <cc> is not used. | The part <ll> is extracted and used together with the table Mapping Two-character Language Codes to evaluate the number given to the SAGTRPC user exit. The part <cc> is not used. |
<ll>_
<cc>.
<Linux-codepage-name> where <ll> is the 2-letter language abbreviation, <cc> the 2 or 3-letter language code according to ISO 639 standard and <Linux-codepage-name> is any alias name for a codepage. |
|
The part <Linux-codepage-name> is extracted from the locale string and used for the ICU converter alias. The parts <ll> and <cc> are not used. | The part <Linux-codepage-name> is extracted and used together with the table Mapping UNIX Codepage Names to evaluate the number given to the SAGTRPC user exit. The parts <ll> and <cc> are not used. |
CP <number> where <number> is the decimal number of a codepage. |
|
The locale string CP <number> is used as is for the ICU Converter alias. | The part <number> is extracted and used together with the table Mapping Codepage Numbers to evaluate the number given to the SAGTRPC user exit. |
<codepage-name> where <codepage-name> is any alias name for a codepage. |
|
The locale string <codepage-nam>e is used as is for the ICU Converter alias. | The locale string <codepage-name> is used together with the table Mapping Java Codepage Names to evaluate the number given to the SAGTRPC user exit. |
Once you have determined the ICU converter alias, use the ICU Converter Explorer under ICU Resources to determine the real ICU converter's canonical name and get more information on the ICU converter.
Once you have determined the number given to SAGTRPC user exit, the implementation of the codepage is your responsibility.
If a locale string is not sent by an EntireX component (sender or receiver), the broker itself makes a rough assumption to assign an ICU Resources or a numeric codepage number with SAGTRPC user exit.
The broker can distinguish between ASCII environments, IBM mainframe and Fujitsu mainframe operating systems if the EntireX component does not indicate anything by the locale string. See the following table:
Platform EntireX Component is running on | ICU Converter | SAGTRPC User Exit |
---|---|---|
Linux, Windows | ibm-5349_P100-1998 | 1252 |
IBM Mainframe | ibm-37_P100-1995 | 0037 |
Fujitsu Mainframe | bs2000-edf04drv | 3587 |
Note that the broker's built-in defaults above may match for the following countries:
Many western countries using Windows configured with the ICU converter ibm-5349_P100-1998 (an ICU-supported alias name is, for example, CP1252) including Java environments.
The United States using OS/390 or z/OS IBM mainframe configured with the ICU converter ibm-37_P100-1995 (ICU-supported alias names are CP37 and ibm-37).
You can customize the defaults to your own requirements. See the respective attribute in Codepage-specific Attributes for how to customize the broker's locale string defaults. For examples of how to configure the broker's locale strings, see next section, Configuring Broker's Locale String Defaults.
The broker's built-in defaults for locale strings can be overridden by assigning the ICU converter name directly or an alias of the ICU converter. See Codepage-specific Attributes.
This procedure is useful
if the built-in default does not meet your requirements
if EntireX components (sender or receiver) do not send locale strings.
For this example it is assumed that the character conversion approach is ICU conversion.
An environment running in Spanish-speaking countries using clients with Windows 1252 codepages and servers on IBM mainframe with codepage 1145. Because 1252 is the broker's default for ASCII environments, the default for IBM mainframe is changed to codepage 1145 only, using the ICU converter alias ibm-1145. The previously used codepage 284 is also possible, but does not contain the euro sign.
DEFAULTS=CODEPAGE * Broker Locale String defaults DEFAULT_EBCDIC_IBM=ibm-1145
As a result, the related ICU converter used as the default for IBM mainframe is ibm-1145_P100-1997. See ICU Converter under ICU Resources.
For this example it is assumed that the character conversion approach is ICU conversion.
An environment running in German-speaking countries using Windows 1252 codepages and servers on IBM mainframe with codepage 1141. Because 1252 is the broker's default for ASCII environments, the default for IBM mainframe is changed to codepage 1141 only, using the ICU converter alias ibm-1141. The previously used codepage 273 is still possible but does not contain the euro sign.
DEFAULTS=CODEPAGE * Broker Locale String defaults DEFAULT_EBCDIC_IBM=ibm-1141
As a result, the related ICU converter used as the default for IBM mainframe is ibm-1141_P100-1997, see ICU Converter Explorer under ICU Resources.
For this example it is assumed that the character conversion approach is ICU conversion.
An environment running in Hong Kong using clients with the Windows 950 (big5) codepage and servers on IBM mainframe with codepage 937. For suitable default values, the broker's default for ASCII environments as well as for IBM mainframes is adapted by assigning ICU converters' alias names.
DEFAULTS=CODEPAGE * Broker Locale String defaults DEFAULT_ASCII=windows-950 DEFAULT_EBCDIC_IBM=ibm-937
As a result, the related ICU converter used as the default is
for ASCII environments: ibm-1373_P100-2002
for IBM mainframe: ibm-937_P110-1999
See ICU Converter Explorer under ICU Resources.
For this example it is assumed that the character conversion approach is ICU conversion.
An environment running in Turkey using clients with the Windows 1254 codepage and servers on IBM mainframe with codepage 1026. For suitable default values, the broker's default for ASCII environments as well as for IBM mainframes is adapted by assigning ICU converters' alias names.
DEFAULTS=CODEPAGE * Broker Locale String defaults DEFAULT_ASCII=windows-1254 DEFAULT_EBCDIC_IBM=ibm-1026
As a result, the related ICU converter used as the default is
for ASCII environments: ibm-1254_P100-1995
for IBM mainframe: ibm-1026_P100-1995
See ICU Converter Explorer under ICU Resources.
The broker's built-in mechanism of mapping locale strings to codepages
can be bypassed by assigning the ICU converter name directly or an alias of the
ICU converter. See Codepage-specific Attributes. Locale string matching is case-insensitive when
bypassing the broker's built-in mechanism, that is, when the broker examines
the DEFAULTS=CODEPAGE
section in the attribute file.
If an EntireX component (sender and receiver) sends a locale string where the broker's built-in mechanism fails and no codepage is found, you can assign a codepage.
If an EntireX component sends a locale string where the broker's built-in mechanism selects the wrong codepage, you can assign the correct codepage.
If you cannot adapt the locale string sent by your EntireX component when an incorrect locale string is sent.
For this example it is assumed that the character conversion approach is ICU conversion.
An EntireX Java component sends ASCII as the locale string. This is mapped by ICU to US-ASCII; instead, ISO 8859_1 should be used. This is done with the following configuration:
DEFAULTS=CODEPAGE * Broker Locale String Codepage Assignments ASCII=ISO8859_1