Phase 2 (introduction)
Standard Process
The default process of phase 2 is the parsing of the received data into the source structure and thus the generation of the source tree.
In the following, additional optional deviations of this processing flow are presented.
Please also note the section Working with Purely Binary Input Data.
Note: Although the intermediate phases described below are logically part of phase 2, their configuration in the GUI can be found on page Basic Data.
Unknown and Ignored Data Segments
If input files contain data segments that cannot be parsed with the existing source structure, this data will be lost (without triggering a profile job error).
However, it is possible to log such data, evaluate it and, if necessary, post-process it. See section Unknown Segment Log Listener.
Environment Check
Environment Check classes verify that certain conditions in the system environment are met before starting a job. The result of the environment check is a logical value (true/false). If the check returns false, the backup file will be created and the job will be suspended. If the check returns true, the job is started.
In the event that the job has been suspended, the class also supplies an integer value indicating the seconds until a restart. After this waiting period, the environment check is carried out again and, depending on the result, the job is started or suspended again. When the job is restarted, it is restarted with the original backup file of the first run. The source data will not be fetched again from the source system. If in the case of false, the next run is in 2147483647 (Integer.MAX_VALUE) seconds, the job is permanently suspended and the backup file is deleted.
The goal of this feature is to delay the processing if an external system becomes unavailable until the system becomes available again to prevent the job from ending with an error message.
Note: The environmental check is designed for short-term environmental problems, not to accumulate thousands of jobs over days. The constant retries strain the system considerably.
Preparsers
There are situations in which an input file cannot be parsed by a standard parser.
The file contains additional control characters that the parser does not expect/allow.
The file contains bytes that are not part of any known encoding. Files contain bytes. Only when an encoding is applied, those bytes become characters. Each encoding contains only a defined set of byte combinations. If there are byte combinations in the file for which the encoding does not allocate a character, the conversion of the bytes in characters is incorrect.
The source format uses uncommon character sequences as record or field delimiters. To properly separate the format into records and fields, the delimiters must be preprocessed.
The source format uses escape sequences for umlauts and other special characters but the target system wants to use the original characters again.
Special formats that cannot be read in with any of the parsers that Lobster_data provides by default.
The first four cases correspond to a repair situation in which a low data quality is to be improved. Usually, those problems can be solved with the preparsers EncodingPatcher or EncodingPatcherWithRegexReplacement.
If there is no suitable default preparser, you (or us on your behalf) can develop one.
Standard Call of a Preparser in Phase 2
Call Preparser with Postexecuter in Phase 6
See PreParserPostExecuter and Phase 6.
Call Preparser with Function in Phase 3
See call-preparser(class a, [config b], [infile c], [result max kb size d], [input encoding e], [result encoding f]) and Phase 3.
PPP Configuration for Preparsers
Postparsers
Note: Of course, you can use a preparser and a postparser, although that will hardly be relevant in practice.
Analogous to the preparser, you can use a so-called postparser class after the main parser. In this class, manipulations can be made to the source tree created by the main parser (e.g. replacements of umlauts or other special characters).
Examples
Examples can be found in the documentation for the respective preparsers, postparsers, and environment check classes.
GUI
The configuration of this phase in the GUI is described in section Phase 2 (GUI).