CSV
Settings
(1) Check input format: If this checkbox is set, the source fields will be checked against the stored format templates during parsing. If a value violates the format template of the source field or exceeds the field length, an error is generated. If errors occur, the profile will not terminate instantly but only after phase 2 or after 50 errors. Attention: This feature puts a strain on performance and should only be used when absolutely necessary.
(2) Check min/max settings: Specifies whether to check the number of repetitions of fields and nodes in the source tree.
(3) Remove blanks (XML will be normalized): Whitespaces at the beginning and end of quoted field values are removed. Example: "33 ";" 44";" 55 "; "66 77 " becomes "33"; "44"; "55"; "66 77".
(4) Force single record: Specifies whether the data should be parsed into a single record. If this checkbox is set, the parser is prevented from creating multiple records.
(5) Parser mode: Row-wise or column-by-column arrangement of the data in the file.
(6) Delimiter for records: The delimiter between columns or rows (see item 5). Usually, for a row-wise arrangement, new line is used. The delimiter may occur in the text, but the affected text must then be surrounded by the CSV quoting character (7).
(7) CSV quoting: See (6).
(8) Start at row: Specifies at which line the data to be used starts. This setting is used to ignore a certain number of lines. The value is used if (5) is set to a row-wise arrangement of the input data. Empty lines are not counted. Note: See also system variable VAR_SYS_LINES.
(9) and at column: Specifies at which column the data to be used starts. This setting is used to ignore a certain number of columns. The value is used if (5) is set to a column-by-column arrangement of the input data. Empty columns are not counted.
(10) Read up to row: Specifies up to which line the data should be read. The subsequent lines in the file are then ignored.
(11) Skip lines after record set: Specifies how many lines should be skipped after a dataset.
(12) CSV is Excel compliant (CSV export from Excel): If set, the CSV input document is expected to be as it is generated by the Microsoft Excel CSV export.
(13) Continue reading on line break within row: If this checkbox is set, line breaks in columns will be removed.
Before |
Hey you there. |
After |
Hey you there. |
(14) Same matchcodes exist: Indicates whether identical record types occur multiple times (nested), but do not belong to the same node. Example:
DatasetA
DatasetB
...
DatasetC
...
DatasetD
DatasetC
DatasetD
DatasetC
...
The first C dataset should be parsed into a separate node, the subsequent C datasets each in a D/C node. So the CSV parser will work similar to an EDIFACT parser here, i.e. it goes through the structure from top to bottom and remembers where it is. Once the lower D/C node has been entered, nothing will be inserted into the upper C node anymore. Note: The set checkbox also affects when the parser creates a new record.
(15) Execute semantic checks: Incoming files can be checked with semantic rules. See section Semantic check .
Lines without dependencies
A CSV file is a text file that contains tabular structured data and is primarily used for data exchange. There can be a line-by-line or a column-by-column arrangement of the data.
The first table shows an example of a line-by-line arrangement. Each row is a dataset, the more datasets, the more rows. The number of columns in the file remains the same.
|
Field1 |
Field2 |
Field3 |
Field4 |
Dataset1 |
D1F1 |
D1F2 |
D1F3 |
D1F4 |
Dataset2 |
D2F1 |
D2F2 |
D2F3 |
D2F4 |
Dataset3 |
D3F1 |
D3F2 |
D3F3 |
D3F4 |
Dataset4 |
D4F1 |
D4F2 |
D4F3 |
D4F4 |
The second table shows an example of a column-by-column arrangement. Each column is a dataset, the more datasets, the longer the rows. The number of lines in the file remains the same.
|
Dataset1 |
Dataset2 |
Dataset3 |
Dataset4 |
... |
Field1 |
D1F1 |
D2F1 |
D3F1 |
D4F1 |
... |
Field2 |
D1F2 |
D2F2 |
D3F2 |
D4F2 |
... |
Field3 |
D1F3 |
D2F3 |
D3F3 |
D4F3 |
... |
Field4 |
D1F4 |
D2F4 |
D3F4 |
D4F4 |
... |
The first listing shows the simplest case of a CSV file in a line-by-line arrangement. The file has only one dataset type, the fields are separated by a semicolon (;).
#OrderHeader=OH
OH;Order1
OH;Order2
OH;Order3
The second listing shows the same CSV file in a column-by-column order.
OH;OH;OH
Order1;Order2;Order3
During mapping, the source tree is mapped to the target tree. The creation of the source tree is done by the parser. The user can create its own nodes and fields in the source structure the input data.
The figure below shows the operation of the parser. The three lines become three datasets with 2 fields each. The parser, starting from the second lines, breaks the input file down into individual datasets. Then the lines have to be further broken down into single values. To achieve this goal, the parser must first be told to interpret an entire row as a column. This is achieved by entering the value "New line" as the delimiter in field (6) of the parser settings. Then the lines have to be taken apart by specifying how the single values in a row are separated. This is done in the attributes of a node.
The following screenshot shows the source structure for the example file, regardless of the arrangement of the data in the file.
The attribute "Delimiter Column/Row" of the source structure node "OrderHeader" of the previous screenshot specifies the delimiter character that separates the values in a row. Since this value is entered in the nodes, different delimiters are possible for different dataset types.
A node of the source structure is only entered if the set conditions are met. These conditions (match codes), can be defined via the context menu of the source structure nodes.
Any number of conditions can be entered for a node. The conditions are ORed (logically linked with OR). In our example, the first value of each line is to be checked against the match codes. The match code mechanism allows you to mix different dataset types in an input file. If checkbox (14) is used, the same match code can be used in subnodes (details see there).
When using the CSV parser, the top nodes must also have the match codes that allow the subnodes to be found. By contrast, when using the fixed-length parser, only the subnodes need to have the match codes.
For a hierarchy of nodes, the whitespace character (" ") or nothing ("") must be entered as the column delimiter in the parent node.
The figure below shows the operation of the parser when breaking down a file into records with different types of datasets.
Lines with dependencies
In the previous example, there were no dependencies between the lines in the input file but usually, there are.
The following listing shows the example of a file with order headers and assigned order positions. The order header is in a separate line, followed by the associated order positions. The file contains three jobs with one, two and three positions.
#OrderHeader=OH
#OrderPosition=OP
OH;Order1
OP;Pos1 O1
OH;Order2
OP;Pos1 O2
OP;Pos2 O2
OH;Order3
OP;Pos1 O3
OP;Pos2 O3
OP;Pos3 O3
Such a hierarchical dependency can be mapped in two different ways.
The first screenshot shows the source structure for the example file. The order header node and the position node were inserted at the top level.
The second screenshot shows an alternative source structure for the example file. The "OrderHeader" node and the "Position" node were placed in a parent node "Order".
The difference between the two solutions is that for the source structure in the first screenshot, a new record is created for each order header, while the source structure in the second screenshot only creates a single record that contains all orders.
If there are no dependencies between the orders that have to be taken care of, the first solution is to be preferred. The reason is that the parser works within a record. The smaller the amount of data in a record, the faster the parser will work, although this only matters with large amounts of data.
If dependencies between the orders are to be established in the output data (for example, combining the orders in one shipment), the second solution has to be chosen. In this solution, the three orders are held in the same record, which allows you to handle the dependencies (see more in the section Phase 3).
Whenever the first top-level node in the source structure (in the case of a CSV parser) is entered and there are multiple top-level nodes, the parser begins a new record. In the first example, a new record is therefore created for each order header. This creates a record for each order, containing the order header and all related positions.
The order of the fields in the nodes determines which values of the data rows are assigned to the fields in the source tree.