This document covers the following topics:
When used in this document, the notation vr represents the 2-digit ICU version number.
ICU makes use of a wide variety of data tables to provide many of its services. Examples include converter mapping tables, collation rules, transliteration rules, break iterator rules and dictionaries, and other locale data. The ICU data library for Natural is provided as a package that contains the desired data items. The usage of packages instead of single data item files increases the performance since there is only one file access during the initialization to load the package. However, it is not so flexible since it requires a rebuild of a package if data items need to be added.
The ICU data library may be customized in order to add existing or new converter mapping tables or to add other data items such as collation rules, break iterator rules and other locale data.
The customization tool for the ICU data library is available from the Download Components area in Empower (https://empower.softwareag.com/). Use the supplied icudtvr.zip file (vr = version) to customize the data libraries for the ICU version required in your Natural environment: ICS Version 1.4 requires icudt54.zip. The files described in this section are contained in the icudtvr.zip file.
Several steps are required to create a new ICU data library package. Some steps are performed on a PC and other steps are performed on the appropriate mainframe platform.
The following topics are covered below:
There are different sources for new data items:
The ICU Data Library Customizer at http://apps.icu-project.org/datacustom/.
ICU converter data archive at https://github.com/unicode-org/icu.
User-defined converter mapping tables.
The ICU Data Library Customizer is a web-based tool provided by
                              IBM. The ICU Data Library Customizer is version-dependent. Therefore, an ICU
                              Data Library Customizer is provided for each supported version. The ICU version
                              can be retrieved using the SYSCP utility (see the function
                              ICU Information). 
               
The Data Library Customizer displays the data items in a tree view. Primarily, all data items are selected. It is possible to deselect all items by expanding the advanced options and choosing the button. Expanding a tree shows all available items of that type. It is possible to reduce the amount of displayed items by using the button of the advanced options. For example, to reduce the tree view to show only items for Japanese countries, enter the string "japanese" in the text box and choose the Filter Items button. Now, several items can be selected or deselected. All selected items are later added to the delivered base ICU data library and will be available for Natural.
The second possibility is to obtain or create a .ucm (source) mapping data file for the desired converter. A large archive of converter data is maintained by the ICU team. This archive is version-independent. If the desired conversion table is not available in the archive, it is possible to create a new one. For the documentation of the layout of converter mapping tables (.ucm files), refer to the chapter Conversion Data of the ICU User Guide (http://userguide.icu-project.org/conversion/data). It is recommended to take a similar mapping table from the archive, rename it and adjust it to the new requirements.
Note:
It is forbidden to change the character mapping of an existing
                                 converter. In fact, this is the creation of a new converter and requires a new
                                 converter name to avoid confusion. 
                  
Detailed information on how to customize the ICU data library is provided in the readme.txt file which is part of the downloaded zip file.
Converter source files are compiled into binary converter files (.cnv files) by using the makeconv.exe tool. It is possible to specify more than one converter.
Example:
| Command | Description | 
|---|---|
makeconv ibm-1142_P100-1997.ucm  |  
                                          
                        Compile the Danish character mapping table of code page IBM-1142 into the binary format. With a subsequent step, the output file ibm-1142_P100-1997.cnv can be added to the new data library package. | 
Converters obtained from ICU are already registered in the alias
                              name table. No additional step is necessary. If the converter source file is a
                              user-defined file, there will be no appropriate entry in the alias name table.
                              In this case, it is necessary to register the new code page in the alias name
                              table. Open the text file convrtrs.txt and append an
                              appropriate entry at the end of that file in the section "User
                              defined code pages". The name of the code page is required and
                              the IANA name is optional. The string { IANA* } declares
                              iana-name as an IANA name. Each user-defined code
                              page requires an entry in the alias name table.
               
The entry has the following format:
name-of-code-page  iana-name { IANA* } 
                         If the code page "my_cp-100" with the IANA name "MYCP" is to be added, the following line in convrts.txt is required:
my_cp-100            MYCP { IANA*} 
                         For more information, refer to the header of convrtrs.txt or to the ICU User Guide.
The modified alias name table has to be compiled with gencnval.exe into a binary file (cnvalias.icu) to be linked to the new data library package.
A new package is created with the tool makpkg.bat. It uses the delivered package icudtvr.dat (vr = version) and merges new, user-defined items. A user-defined item can be an additional package that contains new data items, a single data item such as a new converter (.cnv file), or a text file that contains a list of new items.
Examples:
| Command | Description | 
|---|---|
makepkg icudtvrl.dat  |  
                                          
                        Add the data items contained in icudtvrl.dat to the base package icudtvrb.dat. | 
makepkg ibm-1142_P100-1997.cnv  |  
                                          
                        Add the Danish code page IBM-1142 to the base data library package icudtvrb.dat. | 
makepkg newitems.txt  |  
                                          
                        Add all data items (converters) that are listed in the text file newitems.txt to the base data library package icudtvrb.dat. | 
makepkg.bat produces two files, a big-endian
                              EBCDIC-based binary file and an HL assembler source. The assembler source
                              contains the binary image of the first file packed into DC X'...'
                              statements. The name of the binary file is icudtvre.dat
                              and the name of the assembler source is icudtvre_dat.s.
                              The file icudtvre is a copy of
                              icudtvre_dat.s. The files must never be renamed since the
                              package name "icudtvre"
                              is used as a part of internal references of data items. It is used by the ICU
                              runtime to access data items such as converters and to validate the data file.
                              "icudt" identifies an ICU data file,
                              "vr" is the version and
                              "e" identifies the file as big-endian
                              EBCDIC-encoded.
               
If more than one item is to be added or if the alias name table has been changed, the items have to be declared as a list in the newitems.txt file.
Examples:
Add code pages ibm-939_P120-1999 and
                                           ibm-942_P12A-1999
Entries of newitems.txt:
                     
 
                                        ibm-939_P120-1999.cnv
 
                                        ibm-942_P12A-1999.cnv 
                                      
                     
Add user defined code page my_cp-100
Entries of newitems.txt:
                     
 
                                        cnvalias.icu
 
                                        my_cp-100.cnv 
                                      
                     
For more information, refer to the ICU User Guide.
The result of the previous step is an assembler source module. The assembler module with the new data library package icudtvre has to be transferred to the target platform. The File Transfer Protocol (FTP) is available on each PC and can be used for this task. Ask the system administrator for the required information (such as host name, port number, user name and password) for accessing the target machine via FTP. Since icudtvre is a text file, the transfer mode must be set to ASCII to ensure the correct translation of the file on the target platform. The name of the file on the target platform is arbitrary. However, it is recommended to use the name icudtvre. If it is desired to rename icudtvre, the renamepkg.bat tool has to be used.
The assembler source module must be assembled and linked on the
                              target platform. It can either be linked to the nucleus or it can be loaded
                              dynamically with RCA=name and
                              CFICU=(DATFILE=name).
               
This section lists the profile parameters and macros which are used in conjunction with Unicode and code page support.
Unless otherwise noted, the profile parameters and macros mentioned in this section are explained in detail in the Parameter Reference.
| Parameter or Macro | Description | 
|---|---|
CFICU or
                                             NTCFICU
                                             macro
                         |  
                                        
                        Enables Unicode support for various
                                             Unicode settings. 
                                             
                            See also   |  
                                      
                     
CMPO or
                                             CPAGE keyword subparameter of
                                             NTCMPO
                                             macro
                         |  
                                        
                        Generates code page-sensitive Natural
                                             programs. 
                                             
                            See also   |  
                                      
                     
CP |  
                                        
                         
                                             
                            Defines the default code page for Natural. This code page is used for the runtime and development environment if not superposed with a code page defined for a single object (for example, for a Natural source). Only platform-suitable code pages can be used. This means, for example, that no ASCII code page can be defined for a mainframe platform. An initialization error message occurs if a wrong code page is used. See also   |  
                                      
                     
CPCVERR |  
                                        
                         
                                             
                            Specifies whether a conversion error that occurs when converting from Unicode to code page or from code page to Unicode or from one code page to another code page results in a Natural error or not. This parameter is not regarded for the conversion of Natural sources when loading them into the source area or when cataloging them. It is not regarded whether a Unicode field is converted
                                                  into the code page before an I/O on a terminal emulation. In this case, the
                                                  substitution character defined by ICU is replaced by the placeholder character
                                                  which is defined in   |  
                                      
                     
CPOBJIN |  
                                        
                        Specifies the code page in which the
                                             batch input file for data is encoded. This file is defined in the data set
                                             CMOBJIN. 
                         |  
                                      
                     
CPPRINT |  
                                        
                        Specifies the code page in which the
                                             batch output file shall be encoded. This file is defined in the data set
                                             CMPRINT. 
                         |  
                                      
                     
CPSYNIN |  
                                        
                        Specifies the code page in which the
                                             batch input file for commands is encoded. This file is defined in the data set
                                             CMSYNIN. 
                         |  
                                      
                     
NTCPAGE
                                             macro
                         |  
                                        
                        In the
                                             NATCONFG
                                             module, this macro defines a code page and all related information, such as
                                             placeholder character, locale ID and collation tables. 
                                             
                           See also  
  |  
                                      
                     
OPRB or
                                             NTOPRB
                                             macro
                         |  
                                        
                        Sets the ACODE and/or
                                             WCODE option to define the user encoding if the used Adabas
                                             database is enabled for UES (universal encoding support).
                         |  
                                      
                     
PRINT or
                                             CP keyword subparameter of
                                             NTPRINT
                                             macro
                         |  
                                        
                        Defines the code page for a report. | 
SRETAIN |  
                                        
                        Specifies that all existing sources have to be saved in their original encoding format. See also Customizing Your Environment. | 
See also:
Natural in Batch Mode in the Operations documentation.
For valid code pages, see http://www.iana.org/assignments/character-sets.
This section covers the following topics:
The parameter CFICU and its subparameters are
                              explained in detail in the Parameter Reference. Some of
                              the subparameters have an impact on the performance.
               
If collation services are used to compare Unicode strings, both
                              strings are checked whether they are normalized or not. The check itself
                              consumes a lot of CPU time. If you are sure that the strings are already
                              normalized, you can switch off the check (COLNORM=OFF).
                              
               
In Unicode, it is possible to represent the same character as one
                              code point or as a combination of two or more code points. For example, the
                              German character "ä" can be represented by
                              "U+00E4" or by the combination of the code points
                              "U+0061" and "U+0308".
                              The conversion from Unicode to, for example, IBM01140 treats combined
                              characters as single code points and produces an "a"
                              followed by a substitution character since code point
                              "U+0308" is not represented in the target code page.
                              With CNVNORM=ON,
                              a normalization is performed right before the actual conversion. The
                              normalization consumes additional CPU time and temporary storage. If you are
                              sure that no combining characters are involved in
                              MOVE statements (except
                              MOVE
                                    NORMALIZED), you should set CNVNORM to
                              OFF to increase performance. Note that all possible combinations
                              are represented by a single coded Unicode code point. 
               
Conversion from Unicode to code page and vice versa is not
                              high-performance. The reason is that the ICU implementation is written in C++
                              and that it covers nearly all Unicode, code page and language aspects in the
                              world. However, some code pages can be mapped to Unicode (and vice versa) via
                              translation tables to accelerate conversion. Accelerator tables are activated
                              with the CPOPT
                              subparameter. If it is set to ON, Natural automatically creates
                              two accelerator tables during session initialization by using ICU conversion
                              functions. The first table (with a size of 512 bytes) is used for conversion
                              from code page to Unicode and the other table (with a size of 65535 bytes) is
                              used for conversion from Unicode to code page. During a Natural session, all
                              conversions are then executed via the accelerator tables instead of ICU calls.
                              Accelerator tables are only provided for the default code page (*CODEPAGE).
                              Temporary code pages (for example, in
                              MOVE
                                    ENCODED statements) do not use accelerator tables if the
                              module NATCPTAB is not linked. If it linked, up to 30 accelerator
                              tables based on the ICU database are used to speed up performance.
               
The parameters CFICU and
                              CP can be
                              used to adjust Natural to specific purposes:
               
| Settings | Description | 
|---|---|
 CFICU=OFF,
                                                  CP=OFF |  
                                          
                        Compatibility mode. For running
                                               existing applications without Unicode and without code page support. Legacy
                                               translation tables are used for I/O translation. Compared with former versions,
                                               there is no significant increase in resource consumption (CPU time and buffer
                                               usage). This mode does not need the ICS module SAGICU (or an
                                               alternative ICS module on
                                               z/VSE and z/OS) to be linked to the Natural nucleus. 
                         |  
                                        
                     
 CFICU=ON,
                                                  CP=OFF |  
                                          
                        For new applications that are using
                                               Unicode and code page conversion (MOVE ENCODED) but not
                                               default code page support. Therefore, the system variable
                                               *CODEPAGE
                                               is empty. It is possible to use U format variables, but it is not possible to
                                               use, for example, MOVE A TO U, since this requires the default
                                               code page information. The error NAT3411 will be issued indicating that no
                                               default code page is available.
                         |  
                                        
                     
 CFICU=ON,
                                                  CP=value * |  
                                          
                        For new applications that are using full Unicode as well as code page support. | 
 CFICU=OFF,
                                                  CP=value * |  
                                          
                        This combination does not make sense,
                                               because code page support needs ICU services for conversion. Therefore,
                                               CFICU=ON is enforced in this case and a session initialization
                                               message is issued. 
                         |  
                                        
                     
 * where value is any value
                              other than OFF.
               
The compiler option CPAGE creates objects
                              that can be executed with a code page which is different from the code page
                              used at creation time. This means that all alphanumeric constants of the object
                              which are coded with the code page at creation time have to be converted to the
                              code page which is active at execution time. To make it possible for the
                              Natural object loader to find and convert alphanumeric constants, an additional
                              table is created by the compiler. This increases the size of the generated
                              object, depending on the number of used alphanumeric constants. The conversion
                              at runtime consumes additional CPU time. If the default code page (value of the
                              system variable *CODEPAGE)
                              is the same as the code page at creation time or if the session has no default
                              code page (CP=OFF), no conversion
                              is done. Conversion errors are ignored, independent from the setting of the
                              parameter CPCVERR. If the
                              compiler option CPAGE is set to OFF, no conversion is
                              performed at runtime and the alphanumeric constants are treated as they are.
                              
               
The following sample program is cataloged with code page IBM01141 (German) and is executed with default code page IBM01140 (us). The characters "Ä", "Ö" and "Ü" are defined in both code pages, but at different code points.
Example 1 - CPAGE=OFF:
               
OPTIONS CPAGE=OFF WRITE *CODEPAGE 'ÄÖÜ' END
Output with code page IBM01140 (us):
Page      1                                                 
                                                                              
IBM01140                                                         ¢\! 
                         Example 2 - CPAGE=ON:
               
OPTIONS CPAGE=ON WRITE *CODEPAGE 'ÄÖÜ' END
Output with code page IBM01140 (us):
Page 1 IBM01140 ÄÖÜ
The most common standard for code page names is the IANA name.
                              Therefore, the system variable *CODEPAGE
                              contains the IANA name of the default code page. On z/VSE and z/OS, a code page is qualified
                              by its Coded Character Set ID (CCSID). On BS2000, the Coded Character Set Name
                              (CCSN) is most popular. Currently, Adabas uses the Entire Conversion Service
                              definition (ADAECS). The macro
                              NTCPAGE can be used to assign these different
                              names to the unambiguous IANA name. NTCPAGE is part of the Natural
                              configuration module (NATCONFG). 
               
It does not matter whether the IANA name, the CCSID/CCSN or the
                              alias name is entered with the CP parameter. The alias
                              name can be a user-defined name which is used to assign a more significant name
                              to the code page. In any case, *CODEPAGE contains
                              the IANA name of the selected code page. 
               
In addition, a placeholder character can be defined for a code
                              page. It overwrites the default substitution character of that code page, which
                              is normally a non-displayable character (for example, H’3F’ in an
                              EBCDIC code page). The placeholder character can be used to avoid that
                              non-displayable characters are sent to terminals. 
               
Example:
NTCPAGE IANA=IBM01140,CCSID=1140,ECS=1140,ALIAS=’US’,PHC=003F
The values IBM01140, 1140 or
                              US can be entered with the CP parameter to
                              activate the code page. *CODEPAGE contains the
                              name IBM01140. The substitution character of the code page will be replaced by
                              "U+003F", which is a quotation mark (?). 
               
The number of available code pages depends on the used ICU data library.
All code pages defined in the currently used data package can be
                              used by Natural. An NTCPAGE entry is only necessary if an
                              alternative alias name or placeholder character is desired.
               
The following configuration parameter is available with Natural Development Server (NDV):
| Settings | Description | 
|---|---|
TERMINAL_EMULATION=WEBIO 
                         |  
                                          
                        Specifies that the Natural Web I/O Interface client (which supports Unicode) is used for input and output. | 
The code page information of the object is part of the
                            object directory displayed with the LIST system
                            command. For details, see Displaying Directory
                               Information in the System Commands
                            documentation.
               
The encoding of code page data can be specified on different levels.
The default code page can be defined with the
                              CP parameter.
                              
               
A code page can be defined for Natural sources, batch input
                              (CPOBJIN,
                              CPSYNIN)
                              and output files (CPPRINT). 
               
If a code page is defined at object level, this overwrites the default code page.