EncodingByBomOrXmlPreParser
Configuration file |
EncodingByBomOrXmlPreParser.properties |
Class name |
com.ebd.hub.datawizard.parser.stream.EncodingByBomOrXmlPreParser |
Description
The preparser detects and removes byte order mark (BOM) bytes present at the beginning of the input file and can also detect the encoding of an XML file from the XML header if the BOM is missing. If the file encoding is detected, the file is recoded into the encoding specified in the profile (in "Main settings/General") if necessary (if the encoding is different). The encoding detected in the file is noted in the job log if phase 1 is enabled in the profile logging.
Supported encodings
BOM: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE und UTF-32BE.
XML: Virtually all encoding names possible in the XML header if supported in Java.
Recommendation
The profile encoding should be one that can represent all possible encodings in the input data, e.g. UTF-8. If the input file arrives in this encoding and has no BOM or XML header, it will be parsed correctly with this profile encoding (SOAP data usually has UTF-8 encoding, no XML header and no BOM). However, the profile is then also able to convert input data that comes in a different encoding, which is correctly declared by BOM or XML header, into the desired profile encoding (here UTF-8).
Parameters
Parameter |
Description |
check.BOM |
(optional) If true, the BOM is observed and the file is recoded accordingly. Default: true . |
check.XML |
(optional) If true, the XML encoding is observed and the file is recoded accordingly . Default: true . |
Note: The detection of a BOM takes precedence over the encoding specification in the XML. If both parameters are set to false, the input file is never modified.