Writing Partial Character Encodings

Partial characters present a similar problem when writing to a fixed length file. The parser considers all fixed position fields to be “self contained”. This means that all encoding information for a fixed position field is contained in the byte range specified for the field. Keep this in mind when writing multi-byte encodings to fixed length fields because it is possible to specify a field or record that does not end on a character boundary.

Consider a fixed length field that is 10 bytes, but the string for that field encodes to more than 10 bytes. In this case, the parser will truncate the byte array to fit into 10 bytes. This could result in the creation of invalid characters. Thus, the parser always truncates a string on a character boundary; only complete characters are written to the output file.

Character Boundary Condition
Writing a string to a fixed position field where the string is longer than the fixed position field (where it breaks at a character boundary).	Truncates the string to fit the field.
Writing to a fixed position field that ends in the middle of a multi-byte character.	The field ends on the previous complete character. The partial character is not included in the field, and is replaced by one or more pad characters. For an example, see below
Writing to a fixed position field in the middle of a delimited field that contains a stateful encoding.	Does not generate an error during creation of the file. Parsing the created file will likely result in an encoding error.

To illustrate a case where the parser writes to a fixed position field that ends in the middle of a multi-byte character, consider the following multi-byte encoding:

Field	Number of Bytes in Field	Character 1	Character 2
Field1	4	12	12
Field2	4	12	345

Field	Value
Field1	1212
Field2	12PP

Since byte 3 does not begin on a character boundary, the parser considers character 2 to be a partial character. It truncates bytes, 3, 4, and 5 because those three bytes would extend beyond the end of the field. It replaces these three bytes with two pad characters (represented by PP).