Comments on the EDIFACT Parser
The EDIFACT parser of Lobster_data has the job, to read the content of the EDIFACT file into the source structure. The parser treats segments similarly to the rows in CSV or Fixed-length files. The parser disassembles the file into segments, by viewing everything between a segment identifier and the segment ending as a row. The EDIFACT parser can handle line breaks at the end of the segment or within a segment.
Every segment is assigned to a node on the highest level of the source structure. The assignment of the segment names to the nodes is done by match codes. Unlike Fixed-length or CSV parser, the segments are not assigned rigidly to the first node. The parser remembers the node that was accessed last and does not perform any assignments above this node! The reason for this is the EDIFACT syntax. The segments may not be ordered arbitrarily, their order is fixed.
The following screenshot shows the assignment of segments to nodes in the source structure.
(1) The BGM segment is assigned to the node using the match codes.
(2) The segments DTM are assigned to the node using the match codes. The node is repeated twice, since there are two segments.
(3) The DTM segment is assigned to the node using the match codes. The segment will not be assigned to (2), since the parser has already accessed the FTX node.
The following screenshot shows the assignment of segments to fields of the source structure.
(1) The UNH segment is assigned to the node using the match codes.
(2) The segment field in UNH is assigned to the source structure field using the order. The value for the source structure field in the example is UNH.
(3) The segment field in UNH is assigned to the source structure field using the order. The value for the tree field in the example is 67061107514198. If there are more destination structure fields than segment fields, the last destination structure fields will remain empty.
The following screenshot shows the mapping of components to fields of the source structure.
(1) The DTM segment is assigned to the node using the match codes.
(2) The segment field in DTM is assigned to the source structure field using the order. The value for the tree field in the example is DTM.
(3) The node in the source structure is required, since the second field in the DTM segment consists of several components. An assignment using the match codes does not take place. The EDIFACT parser always assumes a field with components for nodes without match codes!
(4) The component of the segment field is assigned to the source structure field using the order. The value in the example is 200.
(5) The components of the segment field are assigned to the source structure field using the order. If there are more source structure fields than components, the last source structure fields will remain empty.
EDIFACT structures may contain subsegments. These are segments that are located below segments. In Lobster_data, these subsegments are represented as nodes, analogously to normal segments. The node of the higher-level segment has to contain all names of the subsegments in the match codes, otherwise, a subsegment cannot be assigned to a segment.
The following screenshot shows the assignment of subsegments to nodes and fields of the source structure.
(1) The SG1 segment has the subsegments LOC and DTM. For these subsegments to be assigned correctly, the match code in SG1 has to contain the strings LOC and DTM.
(2) The subsegment LOC is assigned using match codes just like a normal segment. As soon as the LOC subsegment appears for the first time, segment SG1 is created. The LOC subsegments are repeated until a DTM subsegment appears. All these subsegments are assigned to the same segment until a LOC segment appears again after a DTM subsegment. This will create another SG1 segment.
(3) All DTM subsegments are assigned to the same segment until a LOC subsegment appears after a DTM subsegment. This will create another SG1 segment.
(4) After another segment has appeared (FTX, DTM-2, ...), a LOC segment will not create an SG1 segment anymore.