CheckDuplicateFile
Configuration file |
None. |
Class name |
com.ebd.hub.datawizard.extensions.CheckDuplicateFile |
Description
This environment check class allows you to determine whether a profile already has converted certain source data. If so, the repeated processing is terminated. The check occurs on the basis of the MD5 hash value itself and not based on the file name.
Including the class
You can select the class directly in field "Environment check by" in "Main settings/Extensions". A configuration file is not required.
When using this class for the first time, a database table is created automatically using the default schema (usually "hub"). Every data stream that passes through the class is recorded in table "dw_file_hash". There is no independent deletion of entries since the hold-back time can be customised individually for every installation and/or profile. Simply set up a profile with a time-driven Input Agent that deletes the table according to your wishes.
Table "dw_file_hash" and the behaviour of the class
The figure shows the structure of the table and its possible values.
MD5 hash of the data.
Point in time of the conversion.
File name if available.
Profile name of the conversion.
Job number of the conversion.
If a conversion is initiated with this environment check, the MD5 hash of the current source file will be determined and will be searched for in the database table. If it already exists, the conversion will be terminated. If the checkbox "Is an error if the profile is called when suspended" is deactivated in the profile, only a message under "General Messages" will be created in the Control Center and the job is
removed completely.
If, however, the checkbox "Is an error if the profile is called when suspended" is activated, the job will be marked as erroneous with the message "Skipping duplicate file..." in the Control Center.
Note: The check is not executed for a mapping test or a restart of the backup file, meaning the job can be restarted if an error occurs in the mapping.