Phase 2: Parsing data (performance)
Did you read the section on memory? The issue with records? Great. Now let’s turn the tables.
Document types "CSV", "Database", "Excel" and "Fixed-length"
Methods that save memory are not necessarily the fastest. By setting up your structure to create many individual records, you will avoid the critical 'out of memory' error. This is always the better solution if a profile has very large volumes of data to process (or if there is a possibility of this occurring) or if multiple memory-intensive profiles could run simultaneously.
But what if you want to read a few thousand datasets from the database, but you have more than enough memory available and you can also control when the profile runs so that it does not have to fight with other profiles for memory? Then speed is likely to be more important to you. In this case, we advise you to use the flattest structure possible. This is because every hierarchical level in the structure, every additional node, takes a little time. Not much, but with a lot of datasets, it starts to add up. In the ideal situation, you will have no nodes at all, just fields.
This is as fast as it gets. And by using a few more tricks during mapping, you can really turbocharge the process. More on this in the section on phase 3. But think about your memory! Before you risk exhausting it, it is better to sacrifice a little bit of performance.
Document type "XML"
Quite handily, the XML parser version V3 is not only more memory-friendly than version V2, it is also faster. So when you expect large volumes of data, simply use version V3 - and if worst comes to worst, version V4. This new XML parser was mentioned in the memory section, and it has a dedicated section as well.
Other formats
There is very little that can be tweaked here when it comes to parsing, but due to the comparably smaller volume of data (per job), this is not much of a problem. Deactivating unused nodes for the document types EDIFACT and X12 has already been mentioned in the memory chapter.