Standard process

images/download/attachments/164332878/Phase_2_Standard_Diagramm_EN-version-1-modificationdate-1738911245210-api-v2.png

The default process of phase 2 is the parsing of the received data into the source structure and thus the generation of the source tree. Furthermore the parser can split the read data into so-called records. Each parser (depending on the document type) behaves slightly differently, so please read the sections Parsing and When does the parser start a new record? The purpose of records is also explained there (there is, for example, an additional option to enter the Responses in phase 6 once per record instead of only once per profile run).

In the following, additional optional deviations of this processing flow are presented.

Note: Although the intermediate phases described below are logically part of phase 2, their configuration in the GUI can be found on page Main Settings.

Unknown and ignored data segments

If input files contain data segments that cannot be parsed with the existing source structure, this data will be lost (without triggering a profile job error).

However, it is possible to log such data, evaluate it and, if necessary, post-process it. See section Unknown Segment Log Listener.

Environment check

images/download/attachments/164332878/Umgebungsprufung_Diagramm_EN-version-1-modificationdate-1738911245226-api-v2.png

Environment check classes verify that certain conditions in the system environment are met before starting a job. The result of the environment check is a logical value ("true"/"false"). If the check returns "false", the backup file will be created and the job will be suspended. If the check returns "true", the job is started.

In the event that the job has been suspended, the class also supplies an integer value indicating the seconds until a restart. After this waiting period, the environment check is carried out again and, depending on the result, the job is started or suspended again. When the job is restarted, it is restarted with the original backup file of the first run. The source data will not be fetched again from the source system. If in the case of "false", the next run is in 2147483647 (Integer.MAX_VALUE) seconds, the job is permanently suspended and the backup file is deleted.

The goal of this feature is to delay the processing if an external system becomes unavailable until the system becomes available again to prevent the job from ending with an error message.

Note: The environmental check is designed for short-term environmental problems, not to accumulate thousands of jobs over days. The constant retries strain the system considerably.

Preparsers

There are situations in which an input file cannot be parsed by a standard parser.

The file contains additional control characters that the parser does not expect/allow.
The file contains bytes that are not part of any known encoding. Files contain bytes. Only when an encoding is applied, those bytes become characters. Each encoding contains only a defined set of byte combinations. If there are byte combinations in the file for which the encoding does not allocate a character, the conversion of the bytes in characters is incorrect.
The source format uses uncommon character sequences as record or field delimiters. To properly separate the format into records and fields, the delimiters must be preprocessed.
The source format uses escape sequences for umlauts and other special characters but the target system wants to use the original characters again.
Special formats that cannot be read in with any of the available parsers.

The first four cases correspond to a repair situation in which a low data quality is to be improved. Usually, those problems can be solved with the preparsers EncodingPatcher or EncodingPatcherWithRegexReplacement.

Standard call of a preparser in phase 2

images/download/attachments/164332878/Preparser_Diagramm_EN-version-1-modificationdate-1738911245240-api-v2.png

Calling preparser with postexecuter in phase 6

See sections PreParserPostExecuter and Phase 6.

Calling preparser with function in phase 3

See sections call-preparser(class a, [config b], [infile c], [result max kb size d], [input encoding e], [result encoding f]) and Phase 3.

PPP configuration for preparsers

See section PPP configuration for preparsers.

Postparsers

images/download/attachments/164332878/Postparser_Diagramm_EN-version-1-modificationdate-1738911245253-api-v2.png

Note: Of course, you can use a preparser and a postparser, although that will hardly be relevant in practice.

Analogous to the preparser, you can use a so-called postparser class after the parser. In this class, manipulations can be made to the source tree created by the parser (e.g. replacements of umlauts or other special characters).

Working with purely binary input data

If an input file contains not only characters with a defined character encoding, but binary, coded or encrypted data, it cannot be parsed. It is not possible to process information in this file, using the profile in the usual way. Example: PDF files, image files (JPG, TIFF, GIF, etc.), MS Word documents.

However, if checkbox "No mapping" is selected, phases 2 to 5 are skipped and profiles can work as a data pump, picking up files in phase 1 and then sending them to a target system in phase 6, "as received". Note: Because phase 3 (mapping) is skipped, there are no profile variables available in the Responses, only system variables. If you need phases 2-5, use checkbox "Data routing".