Software AG Products 10.7 | Integrating On-Premises and Cloud Applications | Processing Flat Files | Sending and Receiving Flat Files | Handling Partial Characters | Writing Partial Character Encodings
 
Writing Partial Character Encodings
Partial characters present a similar problem when writing to a fixed length file. The parser considers all fixed position fields to be “self contained”. This means that all encoding information for a fixed position field is contained in the byte range specified for the field. Keep this in mind when writing multi-byte encodings to fixed length fields because it is possible to specify a field or record that does not end on a character boundary.
Consider a fixed length field that is 10 bytes, but the string for that field encodes to more than 10 bytes. In this case, the parser will truncate the byte array to fit into 10 bytes. This could result in the creation of invalid characters. Thus, the parser always truncates a string on a character boundary; only complete characters are written to the output file.
The following table describes how the parser writes partial characters:
Character Boundary Condition
Writing a string to a fixed position field where the string is longer than the fixed position field (where it breaks at a character boundary).
Truncates the string to fit the field.
Writing to a fixed position field that ends in the middle of a multi-byte character.
The field ends on the previous complete character. The partial character is not included in the field, and is replaced by one or more pad characters. For an example, see below
Writing to a fixed position field in the middle of a delimited field that contains a stateful encoding.
Does not generate an error during creation of the file. Parsing the created file will likely result in an encoding error.
To illustrate a case where the parser writes to a fixed position field that ends in the middle of a multi-byte character, consider the following multi-byte encoding:
Field
Number of Bytes in Field
Character 1
Character 2
Field1
4
12
12
Field2
4
12
345
The parser encodes this multi-byte encoding as follows:
Field
Value
Field1
1212
Field2
12PP
The parser encodes Field1 properly; it considers character 1 and character 2 to be complete characters.
The parser encodes Field2 as follows:
*It considers character 1 to be a complete character
*Since byte 3 does not begin on a character boundary, the parser considers character 2 to be a partial character. It truncates bytes, 3, 4, and 5 because those three bytes would extend beyond the end of the field. It replaces these three bytes with two pad characters (represented by PP).