images/download/attachments/135169041/614-version-1-modificationdate-1680582296261-api-v2.png

(1) If this checkbox is set, the source fields will be checked against the stored format templates during parsing. If a value violates the format template of the source field or exceeds the field length, an error is generated. If errors occur, the profile will not terminate instantly but only after phase 2 or after 50 errors. Attention: This feature puts a strain on performance and should only be used when absolutely necessary.

(2) Specifies whether to check the number of repetitions of fields and nodes in the source tree.

(3) Whitespaces at the beginning and end of quoted field values are removed. Example: "33 ";" 44";" 55 "; "66 77 " becomes "33"; "44"; "55"; "66 77".

(4) Specifies whether the data should be parsed into a single record. If this checkbox is set, the parser is prevented from creating multiple records.

(5) Row-wise or column-by-column arrangement of the data in the file.

(6) The delimiter between columns or rows (see item 5). Usually, for a row-wise arrangement, new line is used. The delimiter may occur in the text, but the affected text must then be surrounded by the CSV quoting character (7).

(7) See (6).

(8) Specifies at which line the data to be used starts. This setting is used to ignore a certain number of lines. The value is used if (5) is set to a row-wise arrangement of the input data. Empty lines are not counted. Note: See also system variable VAR_SYS_LINES.

(9) Specifies at which column the data to be used starts. This setting is used to ignore a certain number of columns. The value is used if (5) is set to a column-by-column arrangement of the input data. Empty columns are not counted.

(10) Specifies up to which line the data should be read. The subsequent lines in the file are then ignored.

(11) Specifies how many lines should be skipped after a dataset.

(12) If set, the CSV input document is expected to be as it is generated by the Microsoft Excel CSV export.

(13) If this checkbox is set, line breaks in columns will be removed.

Before

Hey you

there.

After

Hey you there.

(14) Indicates whether identical record types occur multiple times (nested), but do not belong to the same node. Example:

DatasetA
DatasetB
...
DatasetC
...
DatasetD
DatasetC
DatasetD
DatasetC
...

The first C dataset should be parsed into a separate node, the subsequent C datasets each in a D/C node. So the CSV parser will work similar to an EDIFACT parser here, i.e. it goes through the structure from top to bottom and remembers where it is. Once the lower D/C node has been entered, nothing will be inserted into the upper C node anymore. Note: The set checkbox also affects when the parser creates a new record.

(15) Incoming files can be checked with semantic rules. See section Semantic check .

Structure Templates

See section Templates for source and target structures.

Creating CSV Output Files

If you want to generate a CSV output file, have a look at the sections CSVWithHeaderUnit , CSVWithoutQuotesIU and Integration Units.

Lines Without Dependencies

A CSV file is a text file that contains tabular structured data and is primarily used for data exchange. There can be a line-by-line or a column-by-column arrangement of the data.

The first table shows an example of a line-by-line arrangement. Each row is a dataset, the more datasets, the more rows. The number of columns in the file remains the same.

	Field1	Field2	Field3	Field4
Dataset1	D1F1	D1F2	D1F3	D1F4
Dataset2	D2F1	D2F2	D2F3	D2F4
Dataset3	D3F1	D3F2	D3F3	D3F4
Dataset4	D4F1	D4F2	D4F3	D4F4

The second table shows an example of a column-by-column arrangement. Each column is a dataset, the more datasets, the longer the rows. The number of lines in the file remains the same.

	Dataset1	Dataset2	Dataset3	Dataset4	...
Field1	D1F1	D2F1	D3F1	D4F1	...
Field2	D1F2	D2F2	D3F2	D4F2	...
Field3	D1F3	D2F3	D3F3	D4F3	...
Field4	D1F4	D2F4	D3F4	D4F4	...

The first listing shows the simplest case of a CSV file in a line-by-line arrangement. The file has only one dataset type, the fields are separated by a semicolon (;).

#OrderHeader=OH
OH;Order1
OH;Order2
OH;Order3

The second listing shows the same CSV file in a column-by-column order.

OH;OH;OH
Order1;Order2;Order3

During mapping, the source tree is mapped to the target tree. The creation of the source tree is done by the parser. The user can create its own nodes and fields in the source structure the input data.

The figure below shows the operation of the parser. The three lines become three datasets with 2 fields each. The parser, starting from the second lines, breaks the input file down into individual datasets. Then the lines have to be further broken down into single values. To achieve this goal, the parser must first be told to interpret an entire row as a column. This is achieved by entering the value "New line" as the delimiter in field (6) of the parser settings. Then the lines have to be taken apart by specifying how the single values in a row are separated. This is done in the attributes of a node.

images/download/attachments/135169041/CSV_1-version-1-modificationdate-1680582296290-api-v2.png

The following screenshot shows the source structure for the example file, regardless of the arrangement of the data in the file.

images/download/attachments/135169041/401-version-1-modificationdate-1734664708779-api-v2.png

The next screenshot shows the attributes for the source structure node "OrderHeader" of the previous screenshot. The value in (1) specifies the delimiter character that separates the values in a row. Since this value is entered in the nodes, different delimiters are possible for different dataset types.

Lobster_data only enters a node of the source structure if set conditions are met. These conditions (match codes), can be defined via the context menu of the source structure nodes. The following screenshot shows the dialogue for that.

images/download/attachments/135169041/613-version-1-modificationdate-1734664864359-api-v2.png

Any number of conditions can be entered for a node. The conditions are ORed (logically linked with OR). In our example, the first value of each line is to be checked against the match codes. The match code mechanism allows you to mix different dataset types in an input file. If checkbox (14) is used, the same match code can be used in subnodes (details see there).

When using the CSV parser, the top nodes must also have the match codes that allow the subnodes to be found. By contrast, when using the fixed-length parser, only the subnodes need to have the match codes.

For a hierarchy of nodes, the whitespace character (" ") or nothing ("") must be entered as the column delimiter in the parent node.

The figure below shows the operation of the parser when breaking down a file into records with different types of datasets.

images/download/attachments/135169041/CSV_2-version-1-modificationdate-1680582296286-api-v2.png

Lines With Dependencies

In the previous example, there were no dependencies between the lines in the input file but usually, there are.

The following listing shows the example of a file with order headers and assigned order positions. The order header is in a separate line, followed by the associated order positions. The file contains three jobs with one, two and three positions.

#OrderHeader=OH
#OrderPosition=OP
OH;Order1
OP;Pos1 O1
OH;Order2
OP;Pos1 O2
OP;Pos2 O2
OH;Order3
OP;Pos1 O3
OP;Pos2 O3
OP;Pos3 O3

Such a hierarchical dependency can be mapped with Lobster_data in two different ways.

The first screenshot shows the source structure for the example file. The order header node and the position node were inserted at the top level.

images/download/attachments/135169041/404-version-1-modificationdate-1734664753315-api-v2.png

The second screenshot shows an alternative source structure for the example file. The "OrderHeader" node and the "Position" node were placed in a parent node "Order".

images/download/attachments/135169041/405-version-1-modificationdate-1734664753328-api-v2.png

The difference between the two solutions is that for the source structure in the first screenshot, a new record is created for each order header, while the source structure in the second screenshot only creates a single record that contains all orders.

If there are no dependencies between the orders that have to be taken care of, the first solution is to be preferred. The reason is that the parser works within a record. The smaller the amount of data in a record, the faster the parser will work, although this only matters with large amounts of data.

If dependencies between the orders are to be established in the output data (for example, combining the orders in one shipment), the second solution has to be chosen. In this solution, the three orders are held in the same record, which allows you to handle the dependencies (see more in the section Phase 3).

Whenever the first top-level node in the source structure (in the case of a CSV parser) is entered and there are multiple top-level nodes, the parser begins a new record. In the first example, a new record is therefore created for each order header. This creates a record for each order, containing the order header and all related positions.

The order of the fields in the nodes determines which values of the data rows are assigned to the fields in the source tree.