Lobster Loading

Over seven bridges – data integration step by step

The following process shows how data can be automatically processed to a large extent. Easy-to-use features such as drag-and-drop, multi-device capability, automatic feedback in case of an error and continuous documentation allow for high flexibility and fast responses to current requirements. As well as better task structuring between IT and the respective department.

Some preliminary work should be carried out before the actual integration begins. This includes the precise recording of the currently available data and the related storage locations. It naturally also has to be clarified which data is strictly needed in another system. So: What data needs to be migrated? What is its output format? A good view of the data quality enables you to avoid errors from the outset. What is the quality of the data and does it need to be optimised? Once these points are clarified, you can tackle the actual integration process.

Step 1: Where is the data coming from?

The first question is where the source or input data come from. Ideally, this task is performed by an input agent receiving the data to be processed. In the simplest case, this would be a pure data pump. Here, the data content is forwarded unaltered via one or more response or output paths. Mapping (see step 3) is not carried out. The integration software should be able to actively gather the data (once, but also recurring) or wait for the partner to deliver the data to be processed (event-driven, reactive). You could also say that the software works either as a server or as a client. Of course, you should also be able to establish the times for data collection and the intervals. Ideally, there should be direct connections to as many systems as possible, if WebDAV and SCP are supported just as, for instance, AS2, OFTP, X-400, SMTP, POP3 and IMAP, HTTP(S), FTP(S) and SFTP, SQL, MQTT or AMQP.

Step 2: What data structures are required?

The source structure matching the source data and the document types are defined in the second step. These include CSV, Excel, XML, fixed length, DB (database), EDIFACT, SAP IDOC, X.12, BWA, API or JSON. The document type determines the parser, i.e. the program that is responsible for converting an input into a more suitable format for further processing. Additional format-specific information that only exists in the related format should be configurable, such as CSV quoting in CSV, data sheets in Excel, schema files in XML, etc. It makes things easier if the integration software can handle compression formats like zip or tgz.

Step 3: How is the data mapping set up?

In the third step, you configure the mapping, i.e. the desired target structure as well as the connection between source and target structure. In short, the target or output data is generated here. Make sure you can also manipulate the assigned target data via function chains. Check if your software product includes the largest possible number of prepared functions. These help you to create even very complex mapping, also with the aid of variables, lists and maps. The simplest case is 1:1 mapping. The source structure is mapped unchanged to the target structure. The names are normally also adopted. All source fields are thus assigned to the corresponding target fields.

Step 4: Do you want to describe databases?

You may wish to transfer your data to a database. In this case, you determine whether and, if so, which target data is written to which database tables. A requirement for this is, of course, that the relevant table is created in a schema with which your software is familiar.

Step 5: What should data integration actually do?

If the data integration software only works as a data pump, you do not have to do anything else. This is because when purely functioning as a data pump, the source file is output unaltered ‘as received’. If you would like to transfer your data to other target formats, you must activate a mapping. In some cases, additional editing may be required for the target format. It may be that not all source data is needed in the target format. Or the data should be displayed in a different structure in the target format. Check if your software has this feature. Sometimes the requirements are a bit more complex. For example, if several different edits are intended for the source data in different formats. As a specific example, data from Excel or CSV lists is often used to process orders, delivery notes or invoices in EDIFACT.

Step 6: How complex is your data transfer?

Manual or automatic, at the touch of a button or at a preset time: Now your software retrieves the data at the source or it waits for input from the source systems and transmits the data to the target system(s) on one or more response paths and in the desired form. Your software should sometimes be able to go one step further and trigger follow-up profiles.

Step 7: What happens in the event of an error?

Of course, this only ever happened to others – but suppose, during the initiated data integration, that one of the steps 1 to 6 runs into an error, perhaps because a data source did not deliver or a target system is not available: quick intervention is vital. With the right system, you will automatically receive a message telling you when a predefined task was not performed. You must clarify the following: Does your software allow you to activate a notification function for each step to ensure that you receive a message in case of errors, such as non-execution of tasks, received via e-mail or SMS to one or more addressees? To be on the safe side, your software should also allow for source and target data backups, and document each task unambiguously and continuously whether it had no errors or was interrupted by errors.

With these seven steps, you should be able to map and automate even complex integration tasks with little effort based on a clear structure. This is how you create the basis for a clear solution to the ever-increasing flood of data to be processed and integrated.