XML Parser V4

The XML V4 parser (add-on module) offers significant performance gains with extremely low memory consumption. Compared to the XML-V3 parser, the memory requirement drops to about 10%. For input files up to about 100 MB, the runtime drops to about 5% compared to version 3. In addition, extremely large XML input files of up to a maximum of 250 GB can be parsed. However, the runtime will then significantly increase again because of the necessary disk accesses. More details in section Large XML.

Note: If a preparser is used with the XML V4 parser, the backup file must be overwritten with the result of the preparser.


Settings of the XML Parser V4


First, you have to switch to V4 in (3). Checkbox (1) has to be set since V4 always works without namespaces.

images/download/attachments/73599481/XML_9_EN-version-1-modificationdate-1619681049542-api-v2.png



(1) As already mentioned, this checkbox is mandatory for V4.

(2) Specifies the tag name (element) below which you want to parse. An entry must be made here, even if the entire XML structure is to be parsed. If you want to parse a partial document, it must also conform to the XML convention (well-formed). Note: Can also be an XPath 1.0 expression, e.g. /inventory/books[@title="xxxx"]. Note: See also section Effect of an Entry in Field 'XML tag for record'.

(3) Version 4 of the XML parser.

(4) We do not want to create a new record per item. Instead, we want 2 item elements in each record, see (5). That is why we use the new root element articles. See section Generating Chunks below.

(5) The number of item elements per record, i.e per articles element, see(4). See section Generating Chunks below.

(6) We have two attributes date and ref in root element invoice. We need these in the mapping, so we activate this checkbox and all available attributes are copied into articles.

(7) The specified elements, including their child elements, are copied into each articles element. All XPath 1.0 expressions are allowed. See section Copy Attributes and Elements Redundantly into Each Record below.

(8) If the XML file is smaller than 2 GB, deactivate this checkbox. More details in section Large XML.

Example XML


We will use the following XML file.


<?xml version="1.0" encoding="ISO-8859-1"?>
<invoice date="07.03.13" ref="R-0001">
<header>
<customer>Lobster</customer>
<address>
<name>Lobster GmbH</name>
<street>MünchnerStr. 15a</street>
<zip>82319</zip>
<city>Starnberg</city>
</address>
</header>
<positions>
<item type="1" desc="billing">
<pos>1</pos>
<article id="A-001" name="Article 1" price="1050" amount="1" />
<note>Attention- Glas!</note>
</item>
<item type="0" desc="return">
<pos>2</pos>
<article id="A-002" name="Article 2" price="920" amount="2" />
</item>
<item type="1" desc="billing">
<pos>3</pos>
<article id="A-003" name="Article 3" price="90" amount="3" />
<note>See counter</note>
</item>
</positions>
<footer>
<note code="001">Complete</note>
</footer>
</invoice>


If repeated subelements (here item) are to lead to multiple records, you should not specify the real root element (invoice) of the XML document in (2) (see section Effect of an Entry in Field 'XML tag for record'). Previous parsers (prior to V4) lost all attributes of the root element and any parent or sibling elements that were not within the element item. With the XML parser V4, data from these 'blind' areas of the XML document can be copied into each record, as described in section Copy Attributes and Elements Redundantly into Each Record below.

Generating Chunks


If the structure of the input data would lead to a large number of small records, for example, because it contains several million item elements, the performance would suffer. In this case, it would make sense to combine several item elements in one record. However, there would be no natural structure in the input data to support that. However, you can force the parser to create a virtual element (4) (here articles). This 'chunk' element then appears as a root element that contains several item elements.

The generation of chunks with (4) and (5) is optional. If (4) remains empty, a record is generated per (2) (in the example per item). In that case, the root node of the source structure should correspond to the item element.

If a virtual chunk element is used in (4), an additional root node corresponding to the chunk element (here articles) must be used in the source structure. This node receives a match code as in (4) to parse the virtual chunk element.

Copy Attributes and Elements Redundantly into Each Record


Suppose that we use element item as the XML tag for record for our source file above. The attributes of the element invoice and all data in element header will then be outside the parsed area. As of version V4, the attributes of the real root element can be transferred to every record generated, see checkbox (6). Similarly, in (7), you can enter those elements that are actually outside item but that you want to include in each record. The required adjustment of the profile source structure currently needs to be done manually.

Internal XML Based on the Prior Settings


Internally, based on our prior settings, the input XML looks like this.


Record 1


<?xml version="1.0" encoding="ISO-8859-1"?>
<articles date="07.03.13" ref="R-0001">
<address>
<name>Lobster GmbH</name>
<street>MünchnerStr. 15a</street>
<zip>82319</zip>
<city>Starnberg</city>
</address>
<item type="1" desc="billing">
<pos>1</pos>
<article id="A-001" name="Article 1" price="1050" amount="1" />
<note>Attention- Glas!</note>
</item>
<item type="0" desc="return">
<pos>2</pos>
<article id="A-002" name="Article 2" price="920" amount="2" />
</item>
<footer>
<note code="001">Complete</note>
</footer>
</articles>


Record 2


<?xml version="1.0" encoding="ISO-8859-1"?>
<articles date="07.03.13" ref="R-0001">
<address>
<name>Lobster GmbH</name>
<street>MünchnerStr. 15a</street>
<zip>82319</zip>
<city>Starnberg</city>
</address>
<item type="1" desc="billing">
<pos>3</pos>
<article id="A-003" name="Article 3" price="90" amount="3" />
<note>See counter</note>
</item>
<footer>
<note code="001">Complete</note>
</footer>
</articles>


And here the fitting source or destination structure. Setter identifiers can be defined as usual, but they are already created when the source structure is created automatically. The destination structure was created with a 1:1 mapping.


images/download/attachments/73599481/XML_12_EN-version-1-modificationdate-1619681049557-api-v2.png

Further Performance Improvement


See section Optional XPath Filter.