CheckDuplicateFile

Group

Environment Check

Class Name

com.ebd.hub.datawizard.extensions.CheckDuplicateFile

Function

This environment check class allows you to determine whether a profile has already converted certain source data.

Description

This environment check class allows you to determine whether a profile already has converted certain source data. If so, the repeated processing is terminated. The check occurs on the basis of the MD5 hash value itself and not based on the file name.

Including the Class

You can select the class directly in field Environment check by in Basic data/Advanced settings. A configuration file is not required.

When using this class for the first time, a database table is created automatically using the default schema of Lobster_data (usually hub). Every data stream that passes through the class is recorded in table dw_file_hash. There is no independent deletion of entries since the hold-back time can be customised individually for every installation and/or profile. Simply set up a profile with input agent Cron Job that deletes the table according to your wishes.

Table dw_file_hash and the behaviour of the class

images/download/attachments/21304946/image2016-12-19_14_52_59-version-1-modificationdate-1537837333000-api-v2.png

The figure shows the structure of the table and its possible values.

  • MD5 hash of the data.

  • Point in time of the conversion.

  • File name if available.

  • Profile name of the conversion.

  • Job number of the conversion.

If a conversion is initiated with this environment check, the MD5 hash of the current source file will be determined and will be searched for in the database table. If it already exists, the conversion will be terminated. If the checkbox Is an error if the profile is called when suspended is deactivated in the profile, only a message under General Messages will be created in the Control Center and the job is
removed completely. If, however, the checkbox Is an error if the profile is called when suspended is activate, the job will be marked as erroneous with the message Skipping duplicate file... in the Control Center.

Note: The check is not executed for a mapping test or a restart of the backup file, meaning the job can be restarted if an error occurs in the mapping.