This document covers the following topics:
Some languages, for example Arabic and Hebrew, are written from right-to-left (RTL), whereas the majority of the languages, for example English and German, are written from left-to-right (LTR). Text which contains both left-to-right and right-to-left characters is called bidirectional text.
Support for bidirectional languages is not activated automatically; the user
always has to specify all required parameters (for example, PM=I
) as
described below.
The output of Natural programs can be controlled using the profile parameter PM
, the terminal command
%V
, and the
session parameter PM
.
The profile parameter
DO
(Display
Order) is additionally used to support applications that have been originally written for
terminals which support inverse (right-to-left) print mode, but no bidirectional data.
These applications create the display order of bidirectional data in the application code.
With the parameter DO
, these applications are enabled to run
compatibly also with I/O devices that support bidirectional data. This is for instance the
case if an application runs in a browser with the Natural Web I/O Interface.
The profile parameter PM
defines the default screen direction.
When PM
is set to R
(reset), the default screen
direction is left-to-right. When PM
is set to I
(inverse), the default screen direction is right-to-left. All non-alphanumeric
fields, system
variables and PF key lines are automatically inverted by
Natural so that they are displayed correctly from right-to-left if the screen direction is
right-to-left.
The terminal command %V
can be used to change the screen
direction. If the screen direction is right-to-left, the layout of the current window is
mirrored, which means that the origin of all window components or fields is the upper
right corner. The screen direction is changed to right-to-left using
%VON
and is reverted to left-to-right using
%VOFF
.
The session parameter PM
reverses the direction of a field. The
effect of "reversing the direction of a field" depends on the statement in
which the PM
parameter is used and the platform. If the
PM
parameter is used in a MOVE
statement, the
content of the field is simply reversed (that is, the first character will become the last
character, and so on); the result does not depend on the characters of the field. Trailing
blanks are removed before the field is reversed.
For example, the following program
DEFINE DATA LOCAL 1 TEST1 (A10) 1 TEST2 (A10) END-DEFINE TEST1 := 'program' MOVE TEST1 (PM=I) TO TEST2 INPUT TEST1 (AD=O) TEST2 (AD=O) END
produces the following output:
TEST1 program TEST2 margorp
where "margorp" is the reversed version of "program".
When the PM
parameter is used for IO statements such as
INPUT
or DISPLAY
, its effect is even more complex. In this
case, the field direction is based on the screen direction:
If the screen direction is left-to-right and PM=I
is applied to a field,
the field direction changes to right-to-left.
If the screen direction is right-to-left and PM=I
is applied to a field,
the field direction changes to left-to-right.
On browser terminals (Natural Web I/O Interface), "reversing the field direction" does not mean that the characters of the field are simply reversed. Instead, the complex bidirectional algorithm is applied. On character-oriented terminals, however, the characters of a field are not resorted; they are simply reversed.
In the following example, the characters assigned to the variable TEST
have
been entered in the following sequence:
If the characters are entered in the sequence as described above, the program is displayed in the following way, because the characters are simply displayed in the keying sequence.
DEFINE DATA LOCAL 1 TEST (A20) END-DEFINE TEST := 'abc 123' SET CONTROL 'voff' INPUT TEST (AD=O) / TEST (AD=O PM=I) SET CONTROL 'von' INPUT TEST (AD=O) / TEST (AD=O PM=I) END
This program produces two identical screens because the statements SET CONTROL
'voff'
and SET CONTROL 'von'
do not apply to alphanumeric fields.
Both screens look as follows:
TEST abc 123 TEST 321 cba
In Arabic text, all characters of a string are normally connected with each other. For this reason, Arabic characters have up to 4 presentation forms: the isolated, the final, the initial and the medial form. The form that will be used depends on the position of the character in the string. For example, the Arabic character "MEEM" has the following forms in Unicode:
U+0645 | ARABIC LETTER MEEM | |
U+FEE1 | ARABIC LETTER MEEM ISOLATED FORM | |
U+FEE2 | ARABIC LETTER MEEM FINAL FORM | |
U+FEE3 | ARABIC LETTER MEEM INITIAL FORM | |
U+FEE4 | ARABIC LETTER MEEM MEDIAL FORM |
Moreover, some characters are combined to a new form if they appear consecutively in a string. This is called a "ligature". For example, the characters
U+0644 | ARABIC LETTER LAM | |
U+0627 | ARABIC LETTER ALEF |
have the following combined form:
U+FEFB | ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM |
Unicode strings should include only the Arabic characters in the Arabic block (U+0600 through U+06FF) or the Arabic Supplement block (U+0750 through U+077F); it is not recommended to use the presentation forms in regular Arabic text. It is up to the user interface to display the correct shapes of the characters.
"Shaped" means that every Arabic base character is converted to the appropriate Arabic presentation form. The string may contain each of the four presentation forms of a character. For example, if U+0645 (ARABIC LETTER MEEM) is used as the last character of a string, it is converted to U+FEE2 (ARABIC LETTER MEEM FINAL FORM).
"Unshaped" means that each character is represented only by its basic form. For example, instead of U+FEE2 (ARABIC LETTER MEEM FINAL FORM), U+0645 (ARABIC LETTER MEEM) is used. The conversion to the correct presentation form is performed by the rendering engine of the output device.
Natural strings are internally represented as unshaped alpha or Unicode strings. If
strings are displayed with a browser using the Natural Web I/O Interface client or the
PROCESS PAGE
statement, no transformation is required since the rendering
engine of the browser takes care of the correct presentation. Incoming strings from such
devices are already unshaped and can be directly passed to Natural. If a string is
displayed on a terminal such as 3279 or a terminal emulator such as IBM Personal
Communications, it must be converted into the shaped form since the terminal itself does
not take care of the correct presentation. Accordingly, incoming strings are in the shaped
form and must be transformed into the unshaped form to be processed correctly by Natural.
The most popular code page for Arabic terminals on the mainframe is IBM420. Compared to
Unicode, the number of characters is reduced and not each form of a character is
contained. The conversion of strings into IBM420 substitutes unavailable forms of a
character by a similar presentation form. For example, the medial form of the Arabic
letter MEEM (U+FEE4) is substituted by the initial form (U+FEE3) of the character.
In the Arabic EBCDIC code page IBM420, the Arabic character "MEEM" is represented by the following presentation forms:
H’BA’ | ARABIC LETTER MEEM | |
H’BB’ | ARABIC LETTER MEEM INITIAL FORM |
The Arabic characters SEEN (U+0633), SHEEN (U+0634), SAD (U+0635) and DAD (U+0636) (Seen
Family) are displayed on terminals as two bytes if they appear in the final form. Code
page IBM420 contains a so-called "Arabic tail fragment" that
completes the final form of a Seen Family character on terminals or terminal emulators. Of
course, the Arabic tail fragment needs an additional position on the screen. The Arabic
tail fragment is not required by the browsers. If a string with the final form of a Seen
Family character is entered in a browser (Natural Web I/O Interface client or
PROCESS PAGE
statement) and subsequently displayed on a terminal, the
Arabic tail fragment is appended to the string with the consequence that the length of the
string increases. If a string with the final form of a Seen Family character is entered
via a terminal or terminal emulator and subsequently displayed in a browser, the Arabic
tail fragment is removed from the string.
Note:
For more information about control of character shaping, see SHAPED - Control of Character
Shaping in the Parameter Reference
documentation.