EDIFACT data already has a tree structure. This makes it very easy to parse EDIFACT files into source structures. The user only has to specify the character encoding and whether the data is zipped.

Important note: Frequently, EDIFACT structures are changed by companies for their own purposes, in which case the source structure can be modified by the user. The source structure from a template (or manually generated) can be adapted by the user in this case. However, it should be noted that names of segments cannot be adapted arbitrarily, but must always comply with the syntax SG<number><any suffix> so that they are recognisable for the parser. So you cannot rename segment SG2 to Items, however SG2-Items would be possible. Note: See also section Working with templates.

images/download/attachments/164332922/616-version-1-modificationdate-1704702826705-api-v2.png

(1) If this checkbox is set, the input fields are checked against their format templates during parsing. If a value violates the format template of the field or exceeds the field length or if a mandatory field is empty, an error is created. If an error occurs, the profile job will not be aborted immediately in phase 2, but at the end of phase 2 or after 50 errors. Attention: The use of format checking puts a strain on performance and should only be used if absolutely necessary.

(2) If set, then the minimum and maximum values of nodes in the source structure are checked. In addition, the following segments are checked: UNT (number of segments), UNZ (control count), UNE (control count).

(9) Incoming files can be checked with semantic rules. See section Semantic check .

Splitting of EDIFACT Files

If a file contains multiple EDIFACT documents (UNB-UNZ), Lobster_data can split that file into multiple files (3). But the documents have to be of the same EDIFACT type. Each one of those single documents will then start a job. If this checkbox is not set, then only one job will be created.

Check of EDIFACT Files

General structural checks that always take place:

UNA segment, if present, must have correct length, evaluation of UNA (separator, etc.).
Occurrence of UNH and UNZ.
UNG treatment.
Source structure definitions (number of fields in component, segments must be named SGxxxx).

Note: See also (1) and (2).

(4) The following segments are checked: UNT (number of segments, UNT/UNH comparison), UNZ (counter, reference field UNB.0020), UNE (counter, reference field 0048, reference field 0060).

(5) Instead of the set profile encoding, the specified encoding of the message is used.

(6) An error is generated if the set profile encoding does not match the encoding of the message.

(7) If the encoding of the message is UNOA or UNOB, the values of the message are checked for the allowed characters of the encoding (e.g. a-Z, 0-9 etc.).

Creating EDIFACT CONTRL Messages on Errors

If an error occurs when parsing an EDIFACT message, a CONTRL message can be sent to a specified profile (8). The EDIFACT CONTRL message is only generated for phase 2 errors for version D03A.

Handling Parsing Errors

See section CheckEdifactPreParser.

Creating EDIFACT Output Files

If you want to generate an EDIFACT output file, have a look at the sections EDICreationUnit and Integration Units.

Comments on the EDIFACT Syntax

EDIFACT files consist of

segments,
fields and
components.

Segments can be seen as rows, fields as columns, and components as a part of a column. A segment is always started using a segment identifier and is concluded using an ending character. Example: DTM+200:20060414:102'

The string DTM is the segment identifier, the simple quote sign is the ending character. A new segment has to be started after the end of a segment or there may not be any additional data.

The fields in a segment are separated by a metacharacter. This is, by default, the plus sign (+). Example: GID+2+00000005+00000005'

The segment consists of four field values: GID, 2, 00000005 and 00000005.

The components of a field are separated by a metacharacter. This is, by default, the colon (:). Example: UNH+IFTMIN:D:95B:UN:SUTC+1'

The second field in the segment consists of the 5 following components: IFTMIN, D, 95B, UN and SUTC.

The characters for segment end, delimiter, and component separator can be defined in the UNA segment of an EDIFACT file. The segment UNA is a special case: It describes the characters, with which segments and data are divided or masked within the segment. This segment is optional. If it is not specified, the default settings apply. If there is a UNA segment, it always has to be at the beginning of the document.

The following methods of compression are used in EDIFACT, in order to keep the size of the file small.

Blank fields are indicated by an additional field separator. Example: GID+++00000005'

The segment consists of 4 fields. Fields 2 and 3 are empty and will be skipped. The same mechanism is applied to components.

Blank fields at the end of a segment are indicated by stating the segment end after the last, non-blank character. Example: GID+2'

The segment actually consists of 4 fields, but fields 3 and 4 are empty. The segment end after the second field indicates that all other fields of the segments are empty.

Comments on the EDIFACT Parser

The EDIFACT parser of Lobster_data has the job, to read the content of the EDIFACT file into the source structure. The parser treats segments similarly to the rows in CSV or Fixed-length files. The parser disassembles the file into segments, by viewing everything between a segment identifier and the segment ending as a row. The EDIFACT parser can handle line breaks at the end of the segment or within a segment.

Every segment is assigned to a node on the highest level of the source structure. The assignment of the segment names to the nodes is done by match codes. Unlike Fixed-length or CSV parser, the segments are not assigned rigidly to the first node. The parser remembers the node that was accessed last and does not perform any assignments above this node! The reason for this is the EDIFACT syntax. The segments may not be ordered arbitrarily, their order is fixed.

The following screenshot shows the assignment of segments to nodes in the source structure.

images/download/attachments/164332922/EDIFACT_1-version-1-modificationdate-1704702826721-api-v2.png

(1) The BGM segment is assigned to the node using the match codes.

(2) The segments DTM are assigned to the node using the match codes. The node is repeated twice, since there are two segments.

(3) The DTM segment is assigned to the node using the match codes. The segment will not be assigned to (2), since the parser has already accessed the FTX node.

The following screenshot shows the assignment of segments to fields of the source structure.

images/download/attachments/164332922/EDIFACT_2-version-1-modificationdate-1704702826719-api-v2.png

(1) The UNH segment is assigned to the node using the match codes.

(2) The segment field in UNH is assigned to the source structure field using the order. The value for the source structure field in the example is UNH.

(3) The segment field in UNH is assigned to the source structure field using the order. The value for the tree field in the example is 67061107514198. If there are more target structure fields than segment fields, the last target structure fields will remain empty.

The following screenshot shows the mapping of components to fields of the source structure.

images/download/attachments/164332922/EDIFACT_3-version-1-modificationdate-1704702826717-api-v2.png

(1) The DTM segment is assigned to the node using the match codes.

(2) The segment field in DTM is assigned to the source structure field using the order. The value for the tree field in the example is DTM.

(3) The node in the source structure is required, since the second field in the DTM segment consists of several components. An assignment using the match codes does not take place. The EDIFACT parser always assumes a field with components for nodes without match codes!

(4) The component of the segment field is assigned to the source structure field using the order. The value in the example is 200.

(5) The components of the segment field are assigned to the source structure field using the order. If there are more source structure fields than components, the last source structure fields will remain empty.

EDIFACT structures may contain subsegments. These are segments that are located below segments. In Lobster_data, these subsegments are represented as nodes, analogously to normal segments. The node of the higher-level segment has to contain all names of the subsegments in the match codes, otherwise, a subsegment cannot be assigned to a segment.

The following screenshot shows the assignment of subsegments to nodes and fields of the source structure.

images/download/attachments/164332922/EDIFACT_4-version-1-modificationdate-1734665215386-api-v2.png

(1) The SG1 segment has the subsegments LOC and DTM. For these subsegments to be assigned correctly, the match code in SG1 has to contain the strings LOC and DTM.

(2) The subsegment LOC is assigned using match codes just like a normal segment. As soon as the LOC subsegment appears for the first time, segment SG1 is created. The LOC subsegments are repeated until a DTM subsegment appears. All these subsegments are assigned to the same segment until a LOC segment appears again after a DTM subsegment. This will create another SG1 segment.

(3) All DTM subsegments are assigned to the same segment until a LOC subsegment appears after a DTM subsegment. This will create another SG1 segment.

(4) After another segment has appeared (FTX, DTM-2, ...), a LOC segment will not create an SG1 segment anymore.