No more than necessary


Not all of the data that goes into a profile is always relevant. You can save a good deal of memory and speed by limiting your structures to what is essential. Sometimes it is even possible to ignore the input data completely.

Routing the file based on partial data


Let’s take the following case. You receive various EDIFACT documents, and all you want to do is to transfer them somewhere 'as received', e.g. to store them in directories. However, the files need to end up in different directories depending on the sender.

The sender can normally be determined from the channels through which the data arrives. But what if the files are simply sitting in the same directory that you regularly scan, and the filenames do not allow you to identify where they originally came from? The sender, i.e. the creator of the data, is contained in the UNB segment.

UNB+UNOB:2+Sender+Recipient+....


In the example, you are always the recipient, but files can come from Partner1 and Partner2.

UNB+UNOB:2+Partner1+You+...


UNB+UNOB:2+Partner2+You+...


You could, of course, set up a 1:1 mapping with the corresponding EDIFACT templates and check the appropriate field there. This is not difficult with EDIFACT files because large amounts of data are unlikely. Nonetheless, there is also a more efficient (and therefore faster) way to do it. So let’s create a profile that will handle the data like a CSV file. Specifically, we use the delimiter ', which is normally a segment separator in EDIFACT files.


images/download/attachments/189463966/379-version-1-modificationdate-1738746779849-api-v2.png


Supposing the file starts like this.

UNA:+.? 'UNB+UNOA:2+Partner1+You+040820:0006+3689'UNH+....


Then the first line is the UNA segment, the second is the UNB segment, etc. If the data in the EDIFACT file is all on one line or formatted in 80 character blocks, everything is fine. It always splits at ', and the sender and recipient should still be in the first 80 characters, so a line break would not mess things up either way.

However, if the file has each segment nicely in a separate line, we need to do a little groundwork for our routing profile. The line breaks need to be removed. To do this, use the EncodingPatcher as preparser class, and ensure that the configuration file has the following content.


0x0A=
0x0D=


This tells the preparser that the two characters representing line breaks in Windows (0x0D0A) and Linux/Unix (0x0A) should be replaced with nothing, i.e. deleted, and the file becomes a one-liner. The structure should then look as follows, with the node only reading the UNB segment.


images/download/attachments/189463966/306-version-1-modificationdate-1738746779861-api-v2.png

images/download/attachments/189463966/308-version-1-modificationdate-1738746779854-api-v2.png


The Delimiter column/row for node OnlyUNB should be left blank so that the entire segment is written in field Content. Use Starts with and UNB as matchcode. This mapping produces the following.


images/download/attachments/189463966/307-version-1-modificationdate-1738746779860-api-v2.png


Now you can use functions to check whether the UNB segment in the target field contains the character strings Partner1 or Partner2 and use variables to use the appropriate Response. Similar methods work for other formats.

With XML, you should limit the source structure to the essentials but still use the XML parser. Taking the following data as an example.

<Order>
<Sender>
<CustomerReference>123</CustomerReference>
<Surname>Meier</Surname>
...
</Sender>
<Recipient>
<CompanyReference>987</CompanyReference>
...
</Recipient>
<Items>
<Item>
<!-- many many fields>
</Items>
...
</Items>
</Order>


If you want to process the entire order, of course, you will need the entire structure. However, if you only want to relay them somewhere based on the sender (identifiable from the customer reference number), keep the structure as small as possible.


images/download/attachments/189463966/311-version-1-modificationdate-1738746779851-api-v2.png


That’s enough. You do not need the rest of the data. So why load it all? And most importantly, use XML parser version V3 or V4. Otherwise, everything will be loaded first, which defeats the purpose.

Note: If you are not certain whether you might in fact still need particular nodes in your structure later on, you can set them to 'inactive'. Then they will be ignored during parsing and mapping, but will still be available for reactivation if needed.

Using variables to select different Responses is also not the only way of sending to various destinations depending on the data. Another possibility is described in section Additional IDs (central).

I am only interested in part of the data


The incoming data is more extensive than what you want to analyse. So why load it all, using up a great deal of internal memory, when 90% of the data might be discarded without being touched?

With EDIFACT files, you cannot avoid loading all of the data except in extremely straightforward cases (as in the previous section on routing). The structure is simply too dependent on all segments being correctly loaded. The same is true for X12. However, because the data is normally not too extensive, this is not much of a problem.

XML data can be very extensive, but once again the method described in the previous section can help.

CSV (database, Excel) and Fixed-length (SAP IDOC):

Once again, you can limit the structure to nodes with records that are actually of interest. You should bear in mind, though, that nodes which start a record, for example, are structurally important and must not be removed. And, of course, it is important not to simply leave out fields in the middle, as in the following case (CSV).


ITEM,4711," Lobster brochure",5,PDF,....


After the identifier ITEM comes the product code 4711, then the product name, the quantity, the format, and then something else. Supposing you are only interested in the product code and quantity. The node in the structure must contain at least four fields so that everything up to quantity can be loaded correctly. Only the rest can be ignored.

The content does not interest me at all (or is binary)


This is mostly a special case of the routing case. Supposing you receive image files (.gif, .tiff, etc.) and would like to decide where to store the data (as received) based on the filename only. In this case, you cannot parse the files themselves.

However, a mapping is required in order to set variables (true/false) for different Responses or to select particular channels from the Partner Administration. You have the file name in the system variable VAR_FILENAME. All you need is a target field on which to put your functions.

The only thing that will help you here is a preparser that always provides the same text: the DummyPreParser. It does not require any configuration, and always returns the same thing: A text reading dummy data, encoded in the character set 8859_1. Note: Simply use option Data routing.

Simple CSV mapping with a single target field that sets the variables, and we are done. That saves a lot of unnecessary work.