Content Inspection

With the add-on module Content Inspector Manager, files can be collected and merged or split. The files to be processed must be explicitly supplied to the Content Inspector Manager before they are inspected. There is no inspection in the normal file input, as the name might suggest. The regular file input is not affected by the Content Inspector Manager. At a defined moment, the collected (and possibly split or merged) files are then sent or forwarded for further processing.


images/download/attachments/153256959/Content_Inspection-version-3-modificationdate-1740047721902-api-v2.png


The task of the Content Inspector Manager is to manage a set of content inspectors, who in turn merge or split the files. When a file is split, it happens directly when the file arrives in the Content Inspector Manager. Merging occurs after the collection of files in the File Pool of a Content Inspector.


The following listing shows the entry in the configuration file ./etc/startup.xml that activates the Content Inspector Manager.

<Call name="enableContentInspection">
<Arg>
<New class="com.ebd.hub.datawizard.ci.InspectorManager">
<Set name="mailSubject">Content Inspector can not handle file!</Set>
<Set name="mailRecipients">support@lobster.de</Set>
<Set name="mailBody">Please have a closer look...</Set>
<Set name="mailOnFailedFile">true</Set>
<Set name="interval">500</Set>
<Set name="enterFailedInspector">false</Set >
</New>
</Arg>
</Call>

Parameter

Description

interval

Specifies at which time interval (in ms) the Content Inspectors are to be checked for reaching the maximum number of files or exceeding the maximum waiting time.

mailOnFailedFile

If "true", an email will be sent when a file is stored in the Control Center under "Content Inspection/Unknown Files".


mailSubject, mailRecipients, mailBody

The subject, email address(es) and message text for the email in case of an error.

enterFailedInspector

If "true", then a file that has been directly assigned to a specific Content Inspector is sent back to the Content Inspector Manager and reanalysed by all Content Inspectors if it has been rejected by that specific Content Inspector.

If "false", then this file will end up in the Control Center under "Content Inspection/Unknown Files" after being rejected by the Content Inspector.


Note: The descriptions of the parameters will become clearer in the course of the following explanations.

Supplying files with a profile


The files are fed to the Content Inspector Manager via a custom class in phase 6 of a profile. These classes can also pass files to specific Content Inspectors. The following classes are available for this.


  • ContentInspectionResponse

  • ContentInspectionResponseAsync

  • ContentInspectionResponseUTF8

  • ContentInspectionResponseUTF8Async

Finding responsible Content Inspector


If the Content Inspector Manager receives a file, the decision of whether a Content Inspector is responsible for processing a file is influenced by two Content Inspector settings.


  • The file name pattern. This is an easy way to filter out files by name without having to check the content.

  • The list of File Inspectors (each Content Inspector has one or more file inspectors assigned). If the file name matches the file name pattern, the list of File Inspectors will be gone through to decide whether the file should be processed by the Content Inspector. The Content Inspector processes the file if at least one File Inspector accepts the file content. The first Content Inspector that feels responsible will be used. The file is submitted to the File Pool of this Content Inspector. More details later.


If the Content Inspector Manager cannot assign a file to a Content Inspector, it can be found in the Control Center under "Content Inspection/Unknown Files"


The order in which the responsibility of Content Inspectors is checked depends on the following factors.


  • If one or more Content Inspectors specifically have been specified in the Response of a profile in field "Config file", only these will be checked.

  • What is the priority of the Content Inspectors? The priorities can be set from 1 (high) to 9 (low) for each Content Inspector. For Content Inspectors of the same priority, the order is random.

Splitting files or assignment to File Pool


The Content Inspector Manager uses the relevant File Inspector to check if an incoming file needs to be split. This is the case if a file splitter is specified there. If yes, the file will be split. The newly created split files are redelivered to the Content Inspector Manager as if they were sent by a profile. Any specific Content Inspector set for the original file will be lost and all Content Inspectors will now be checked for responsibility. However, this can also be set differently in the splitting File Inspector. The original file will not be sent again.

Files will immediately be split (if applicable), independently of the processing of the File Pool.

Status of a Content Inspector


A Content Inspector can be in one of three states: "OK", "Wait" or "Overdue".


  • In status "Ok", the files in the File Pool of the Content Inspector are processed.

  • In the "Overdue" status, the files in the File Pool are no longer released for processing. The Content Inspector only gets into the status "Overdue" if the maximum waiting time has been exceeded and no automatic processing of the files has been configured for this case.

  • In status "Wait" the Content Inspector waits for either the maximum waiting time to be exceeded or the maximum number of files to be collected to be reached.

Processing the File Pool


For the File Pool of a Content Inspector, a maximum number of files (to be collected) can be set or a maximum waiting time (to wait for files).

The File Pool will be processed either because the maximum number of files in the File Pool has been reached, or because more than the maximum waiting time has passed after the last file has been sent to File Pool. Note: If a new file is sent to the File Pool, the elapsed waiting time since the last file was sent is reset to 0, unless this behaviour has been explicitly disabled.

In addition, the processing can be triggered by workflows.

The following figure shows an example of a chronological sequence.


images/download/attachments/153256959/Content_Inspection_4-version-1-modificationdate-1697516614292-api-v2.png


(1) The axis represents the chronological sequence.

(2) The axis represents the increase in time after receiving the last file, i.e. the currently valid waiting time.

(3) An arrow marks the sending of a file to the File Pool of the Content Inspector. From this point on, the waiting time is reset and started again.

(4) A minute has passed after a file has been sent to the File Pool. The waiting time is increased by 1.

(5) Another file is sent into the File Pool and the waiting time is reset to 0.

(6) Another file is sent to the File Pool. The waiting time remains at 0.

(7) More than 3 minutes have passed since the event (6). The new file sent sets the waiting time back to 0.

(8) 4 minutes have passed since the event (7). If the maximum waiting time is set to 4 minutes, the 4 files that have been sent to the File Pool are now processed. Note: If the Content Inspector's checkbox "Proceed if timeout is reached" is not set, the Content Inspector gets the status "Overdue".

(9) A single file was sent, no further files follow. After 4 minutes, the file is processed because the waiting time has been exceeded.


Note: The increasing waiting time can be stopped for a certain period of time. See also area Exceptions (Content Inspection).

Note: See also area File compound.

Merging files


When processing of the File Pool is triggered, the content inspector checks whether its files (or parts of their contents) can be merged. The result of the merge can be 0-n files. Merging the files is done by a merger of the Content Inspector. Note: If a Merger is set in the Content Inspector and it returns 0 files, the Content Inspector assumes the status "Overdue".

Processing files


If a processing profile is specified in the Content Inspector, then depending on the setting in the Content Inspector, the files are forwarded to the processing profile either from the Merge or from the File Pool of the Content Inspector. There is a serial processing of the data without overtaking.

A processing profile always gets the variables (the user-defined variables and the system variables) of the profile that sent the file to the Content Inspector Manager. If files are merged, the processing profile only gets the variables associated with the first file of the file compound.

Instead of sending the files to a processing profile, they can also be sent via email or X.400 or to the ASM (the Asynchronous Sending Module).

See also area Data transmission.

Summary


  1. Files are sent to the Content Inspector Manager with special classes in the Response of a profile.

  2. The Content Inspector Manager coordinates a set of Content Inspectors.

  3. Content Inspectors can split files, collect them in a File Pool, merge them if necessary, and ultimately forward them for processing (via various routes) if a maximum waiting time or a maximum number of files has been reached.