Phase 3: Mapping (performance)
No more than necessary
A similar rule applies to performance as to memory: the less data you actually load (i.e. save in the internal structures for the mapping process) the faster things run. Check your requirements to see whether you really need to load and map all of the data, or whether you could in fact simply ignore parts of it. More detailed information on this can be found in the memory chapter.
Simple data mapper
To get straight to the point, the simple data mapper is only practical in a few situations. The main reason for this is that it does not allow paths or hierarchical node structures to be used, and these are necessary in most data structures of any complexity. Calculated fields cannot be used here either. However, if you do not need any of these, use the simple data mapper to achieve a tremendous increase in performance. If you’re not sure, just try it out. If the result looks good, use the simple mapper. If you get nowhere, switch back to the advanced mapper.
Let’s take another look at our customer data, which we now want to transfer from a database to a simple Excel spreadsheet.
4711,Maier,Harald,Hamburg
4712,Müller,Hugo,Frankfurt Main
4713,Huber,Toni,Munich
Here’s how the structures look.
Source structure:
Target structure:
As mentioned earlier, a record will be created for each row in the database. If you are sure that there will not be any memory problems due to the data volume or the time at which the profile runs, you can prevent this by using option Force single record (only available for some document types, see there). This can improve the speed slightly, but do not do it unless you are certain it is safe. As a side note, it also means that you cannot use functions like isFirstRecord because there is only one record. A variable construct can be used instead, although the situation is more difficult with isLastRecord.
In this example, you really do not need any paths or hierarchies, etc. You simply want to convert each row from the database into a row in the Excel file. There’s almost nothing easier. And Excel can handle over 60,000 rows even in the old file format (the new .xlsx does not have this limit). Transferring an entire customer list could involve several thousand or tens of thousands of rows. If you use the simple data mapper here instead of the advanced mapper, you can very quickly achieve time savings of more than 50 percent, sometimes up to 70 percent. And it also happens to go some way towards saving memory. Now imagine that you want to move a few million datasets from one database to another - you could save hours in the mapping process. In this case, of course, a single record should not be used under any circumstances, unless you also use the TokenStreamSplitter already described in the memory chapter.
SQL nodes
Nodes in the target structure have the attribute "Only for SQL". The purpose of this attribute is the following.
There are cases where you want data to be written to a database, but you also want a file to be created at the same time. And this file should not contain the same data that has been written to the database. For example, you want it to show a few totals - the number of datasets added, etc. - but not thousands of rows containing the data that actually should only be written in the database. In this case, we set the Only for SQL attribute to Yes in the nodes that are responsible for the database tables, so that the data in these nodes is ignored in all subsequent steps (including the Integration Unit).
Handy, but it has a catch: After phase 4 (SQL), every single record is swapped in and swapped out again after removing all of the data from these nodes. That comes at a cost. You should therefore only set this attribute to Yes if you actually need it, as described at the start. Otherwise, please always set No.