No More than Necessary
Not all of the data that goes into a profile is actually relevant. You can save a good deal of memory and speed by limiting your trees to what is essential. Sometimes it is even possible to ignore the input data completely.
Routing the File Based on Little Data
Let’s take the following case. You receive various EDIFACT documents, and all you want to do is to transfer them somewhere 'as received', e.g. to file them in directories. However, the files need to end up in different directories depending on the sender.
The sender can normally be determined from the channels through which the data arrives. But what if the files are simply sitting in a directory that you regularly scan, and the filenames do not allow you to identify where they originally came from? The sender, i.e. the creator of the data, is contained in the UNB segment.
UNB+UNOB:2+Sender+Recipient+....
In the example, you are always the recipient, but files can come from partner 1 and partner 2.
UNB+UNOB:2+Partner1+You+....
UNB+UNOB:2+Partner2+You+....
You could, of course, set up 1:1 mapping with the corresponding EDIFACT templates and check the appropriate field there. This is not difficult with EDIFACT files because large amounts of data are unlikely. Nonetheless, there is also a more effective (and therefore faster) way to do it. So let’s create a profile that will handle the data the same way as the CSV format - specifically, use the delimiter ', which is normally a segment separator in EDIFACT files.
We will load the first two lines as there may be a UNA segment first. Supposing the file starts like this.
UNA:+.? 'UNB+UNOA:2+Partner1+You+040820:0006+3689'UNH+....
Then the first 'line' is the UNA segment, the second is the UNB segment, and everything from UNH onwards is ignored. If the data in the EDIFACT file is all on one line or formatted in 80 character blocks, everything is fine. It always splits at ', and the sender and recipient should still be in the first 80 characters, so a line break would not mess things up either way.
However, if the file has each segment nice and legibly on a separate line, we need to do a little groundwork for our routing profile. The line breaks need to be removed. To do this, use the EncodingPatcher as preparser class, and ensure that the configuration file has the following content.
0x0A=
0x0D=
This tells it that the two characters representing line breaks in Windows (0x0D0A) and Linux/Unix (just 0x0A) should be replaced with nothing - i.e. deleted. This puts the document onto one line, and the problem is resolved. The tree should then look as follows, with the node containing only the UNB segment.
The Delimiter column/row for node OnlyUNB should be left blank so that the entire segment is written in field Content. This mapping produces the following.
Now you can use functions to check whether the UNB segment in the destination field contains the character strings Partner1 or Partner2 and use variables to apply the appropriate Response Route. Similar methods work for other formats. With XML, you should limit the source structure to the essentials but still use the XML parser. Taking the following data as an example.
<
Order
>
<
Sender
>
<
CustomerReference
>123</
CustomerReference
>
<
Surname
>Meier</
Surname
>
...
</
Sender
>
<
Recipient
>
<
CompanyReference
>987</
CompanyReference
>
...
</
Recipient
>
<
Items
>
<
Item
>
<!-- many many fields>
</
Items
>
...
</
Items
>
</
Order
>
If you want to process the entire order, of course, you will need to expand the entire structure. However, if you only want to relay them somewhere based on the sender (identifiable from the customer reference number), keep the structure as small as possible.
That’s enough. You do not need the rest of the data. So why go to the effort of loading it all? And most importantly, use XML parser version V3 or V4. Otherwise, everything will be loaded first, which defeats the object. See the XML input format section above.
If you are not certain whether you might in fact still need particular nodes in your structure, later on, you can set them to 'inactive'. Then they will be ignored during parsing and mapping, but will still be available for reactivation if needed.
Using variables to select one Response Route from multiple options is also not the only way of controlling various destinations depending on data. Another possibility is described in section Additional IDs (central).
I Am Only Interested in Part of the Data
The incoming data is more extensive than what you want to analyse. So why load it all, using up a great deal of internal memory, when 90% of the data might be discarded without being touched?
With EDIFACT files, you cannot avoid loading all of the data except in extremely straightforward cases (as in the previous section on routing). The structure is simply too dependent on all segments being correctly loaded. The same is true for X12. However, because the data is normally not too extensive, this is not much of a problem.
XML data can be very extensive, but once again the method described in the previous section can help.
CSV (database, Excel) and fixed length (SAP IDOC):
Once again, you can limit the structure to nodes with records that are actually of interest. You should bear in mind, though, that nodes which start a record, for example, are structurally important and must not be removed. And, of course, it is important not to simply leave out fields in the middle, as in the following case (CSV).
ITEM,4711," Lobster_data manual",5,PDF,....
After the identifier ITEM comes the product code 4711, then the product name, the quantity, the format, and then something else. Supposing you are only interested in the product code and quantity. The node in the structure must contain at least four fields so that everything up to quantity can be loaded correctly. Only the rest can be ignored.
The Content Does Not Interest Me at All (or Is Binary)
This is mostly a special routing case. Supposing you receive image files (.gif, .tiff, etc.) and would like to decide where to file the data (as received) based on the filename only. In this case, you cannot parse the files themselves.
However, a mapping is required in order to set variables (true/false) for different Response Routes or to select particular channels for Partner Administration. You have the file name in the system variable VAR_FILENAME. All you need is a destination field on which to base the function.
The only thing that will help you here is a preparser that always provides the same text: the DummyPreParser.
It does not require any configuration, and always returns the same thing: A text reading dummy data, encoded in the character set 8859_1.
Simple CSV mapping with a single destination field that sets the variables, and we are done. This way you will save Lobster_data a lot of unnecessary work.