Phase 3: Mapping
No More than Necessary
A similar rule applies to performance as to memory: the less data you actually load (i.e. save in the internal structures for the mapping process) the faster things run. Check your tasks to see whether you really need to load and map all of the data, or whether you could in fact simply ignore parts of it. More detailed information on this can be found in the memory chapter.
Simple Data Mapper
To get straight to the point, the simple data mapper is only practical in a few situations. The main reason for this is that it does not allow paths or hierarchical node structures to be used, and these are necessary in most data structures of any complexity. Calculated fields cannot be used here either. However, if you do not need any of these, use the simple data mapper to achieve a tremendous increase in performance. If you’re not sure, just try it out. If the result looks good, use the simple mapper. If you get nowhere, switch to the advanced mapper.
Let’s take another look at our customer data, which we now want to transfer from a database to a simple Excel spreadsheet.
4711,Maier,Harald,Hamburg
4712,Müller,Hugo,Frankfurt Main
4713,Huber,Toni,Munich
Here’s how the structures look.
Source structure:
Destination structure:
As mentioned earlier, a record will be created for each row in the database. If you are sure that there will not be any memory problems due to the data volume or the time at which the profile runs, you can prevent this by using option Force single record (only available for some document types). This can improve the speed slightly, but do not do it unless you are certain it is safe. As a side note, it also means that you cannot use functions like isFirstRecord because there is only one record. A variable construct can be used instead, although the situation is more difficult with isLastRecord.
In this example, you really do not need any paths or hierarchies, etc. You simply want to convert each row from the database into a row in the Excel file. There’s almost nothing easier. And Excel can handle over 60,000 rows even in the old file format (the new .xlsx does not have this limit). Transferring an entire customer list could involve several thousand or tens of thousands. If you use the simple data mapper here instead of the advanced mapper, you can very quickly achieve time savings of more than 50 percent, sometimes up to 70 percent. And it also happens to go some way towards saving memory.
Now imagine that you want to move a few million datasets from one database to another - you could save hours in the mapping process. In this case, of course, a single record should not be used under any circumstances! But there is no reason why you cannot use the TokenStreamSplitter described in the memory chapter.
Functions
Almost all mapping processes of any complexity use functions, e.g. to manipulate values or to pick out the right value from various input nodes. Lobster_data offers hundreds of these functions, and it is easy to get lost in the maze or to simply choose either the first function that seems appropriate or one that you already know. Many tasks can be handled far more quickly and easily if only the right functions are used. Often, they allow much smaller and simpler tree structures to be created.
There is an entire chapter still to come about functions and how to use them efficiently. We strongly recommend reading this chapter, because the more complicated you make your calculations, the more processing time they will need.
SQL Nodes
Nodes in the destination structure have the attribute Only for SQL. The purpose of this attribute is the following.
There are cases where you want data to be written to a database, but you also want a file to be created at the same time. And this file should not contain the same data that has been written to the database. For example, you want it to show a few totals - the number of datasets added, etc. - but not thousands of rows containing the data that actually should only be written in the database. In this case, we set the Only for SQL attribute to Yes in the nodes that are responsible for the database tables, so that the data in these nodes is ignored in all subsequent steps (including the Integration Unit).
Handy, but it has a catch: After phase 4 (SQL), every single record is swapped in and swapped out again after removing all of the data from these nodes. That comes at a cost. You should therefore only set this attribute to Yes if you actually need it, as described at the start. Otherwise, please always set No.