Parsing

The task of the parser is to fill the fields in the source structure with the data received. An input file usually consists of different record types, with each record type containing several individual values. For example, a record of the type address might contain the individual values surname, first name, street, etc.

The different records must be assigned to the correct node of the source structure, and within each record, the individual values must be assigned to the correct fields. The structure of the input data must, therefore, match the source structure of the profile.

To assign the records to the correct nodes, you have to specify a match code for each node.

When developing a profile for specific input data, the task is to define the appropriate source structure. Specifically, you have to define a corresponding node in the source structure for each record type that contains the corresponding fields to hold the single values of that data type. The hierarchical structure of the nodes has to be considered as well. Note: Structure templates are available for the creation of a source structure, or you can create it yourself. See section Working with Templates.

It is possible that several records of the same type follow each other in the input data. These records are parsed in the same node of the source structure because they have the same formal structure. We refer to the source structure filled with input data as the source tree. So we may have a multiplicity of a specific node in the source tree (containing the same record type), but only one node for that record type in the source structure.

Records


Parsers in Lobster_data structure the read-in data in so-called records. See section When Does the Parser Start a New Record?

Example


The following example is for illustrative purposes only. Details will be learned while working through the GUI.

Assume the following CSV input data. Each line represents an address record and contains a record type identifier, a first name, a surname, and a street.


ADR;Randy;Random;Mainstreet 1
ADR;Sharon;Fillerup;Treeway 3


We use the following source structure into which we want to parse the data. We use the match code Starts with ADR for node address.


images/download/thumbnails/73597150/Parsen_1_EN-version-1-modificationdate-1618804388349-api-v2.png


In a mapping test, we then get the following source tree with two records.


images/download/thumbnails/73597150/Parsen_2_EN-version-1-modificationdate-1618804388353-api-v2.png

Syntax Errors in the Source Data


See section Syntax Errors in the Source Data.

GUI


The configuration of this phase in the GUI is described in section Phase 2 (GUI).