In most East Asian languages, language-specific characters in code page strings (that is, Natural format A) are represented by 2 bytes (the so-called double-byte characters) and ASCII characters are represented by 1 byte. Thus, a code pages string consists of characters with different lengths: some have 1 byte and others have 2 bytes.
Natural provides a basic support for double-byte characters. On Windows, this support is activated when both the Natural default code page and the Windows system code page are defined as double-byte code pages. If Natural does not define a specific code page, it is sufficient when a double-byte Windows system code page has been defined. On Linux, the support for double-byte characters is activated when the Natural default code page is a double-byte code page.
When double-byte character support is enabled, Natural assures for all string manipulations that a double-byte character is treated as a unit. This is essential for keeping the meaning of a string.
If a single leading or trailing byte of a double-byte character is left over after the
manipulation of a variable of format A (for example, after extracting a substring with the
SUBSTRING
option), this byte is replaced with a blank character.
For the example below, the code page Shift_JIS is selected. Variable #A
contains a string which consists of four characters. The first and last character is the
double-byte character "FULL WIDTH LATIN SMALL LETTER B" which is
represented in code page Shift_JIS by the byte sequence H'8282'
. The second and
third character is the single byte character "LATIN SMALL LETTER
A" which is represented by one byte H'61'
. Thus, the
hexadecimal representation of the full string is H'828261618282'
.
DEFINE DATA LOCAL 1 #A (A10) END-DEFINE #A := 'aa' WRITE #A #A (EM=H(6)) EXAMINE #A FOR PATTERN '' REPLACE 'a' WRITE #A #A (EM=H(6)) END
Without double-byte character support the output of the above program is as follows:
Page 1 07-02-07 17:22:09 aa 828261618282 a 826161828220
This is the result of not having treated the character ""
(H'8282'
in code page Shift_JIS) as one unit. The trailing byte of this
character and the following character "a" (H'61'
)
are falsely interpreted as the double-byte character ""
(H'8261'
in code page Shift_JIS).
With double-byte character support, the output of the program is as expected:
Page 1 07-02-07 17:22:09 aa 828261618282 aa 828261618282
Note:
On Windows, the Natural output window has been Unicode-enabled which means that all
fields have Unicode format now. In case of A format fields containing double-byte
characters, the behavior of the Natural output window has changed slightly. For A format
input fields it is now possible to enter "Unicode-string-length" characters in
the field. When leaving the field and the default code page is a double-byte code page, all
characters which do not fit into the target A format field are removed. For example, an A10
field can hold 5 double-byte characters. In the output window, this field is represented by
a Unicode field of length 10 with display length 5. So the user can enter 10 double-byte
characters in the input field. When the user moves the cursor to another field on the page
or leaves the page by pressing ENTER, the content of the field is converted to
code page format so that only the first 5 double-byte characters remain.