Phase 1
Input Agents
The input data is received by so-called Input Agents in phase 1. If several incoming agents are considered, the profile scoring decides.
Backup and "Unresolved"
If the data could be assigned to a profile, a backup of the input files and a job (for that profile) will be generated. Otherwise, the files end up on the "Unresolved" page.
Virus scanner
It is possible to execute a virus scanner (Java class) at this point. The class is called whenever a backup file is generated or a file ends up on the "Unresolved" page.
To do so, a class derived from com.ebd.hub.datawizard.plugin.AbstractVirusCheck has to be created. This class only has to implement the following two methods.
/**
* Check file for virus
*
* @param backup backup file
* @throws Exception on any error
*/
public
abstract
void
checkFile(File backup)
throws
Exception;
/**
* Check data for virus
*
* @param data received data, most likely by AS2 (already encrypted)
* @throws Exception on any error
*/
public
abstract
void
checkData(
byte
[] data)
throws
Exception;
In these two methods, the virus scanner must be called with the data and throw an exception if the data is contaminated. The class has to be included in configuration file ./etc/startup.xml (./etc/startup_dmz.xml on a DMZ server) with the following entry.
<
Set
name
=
"virusScanner"
>your_class_name_including_package</
Set
>
Important note: We provide a programming interface (API) that allows you to develop your own classes in Java. For this, we offer in-depth training. If you are interested, please contact our support or sales staff.
Thread Queues
A distinction can be made between Input Agents that process jobs directly and those that send them in a Thread Queue. The two different methods were chosen to ensure the immediate execution of some of the jobs.
Jobs that Are processed directly
The jobs of all time-driven Input Agents. Note: The option Activate parallel processing also allows processing via a thread queue for the time-driven Input Agent of type "HTTP".
The jobs of event-driven Input Agents of type "HTTP". Note: The option Activate parallel processing also allows processing via a thread queue.
The jobs of event-driven Input Agents of type "AS2".
The jobs of event-driven Input Agents of type "Message".
Jobs of time-driven Input Agents are processed sequentially per profile, jobs of Input Agents of type "HTTP", "Message" and "AS2" can be processed in parallel for each profile.
Jobs that are stored in a Thread Queue
The jobs of all other Input Agents.
The entries (jobs) of the Thread Queues are processed by the profiles that originally created them. The maximum number of different profiles working at the same time can be configured. The following listing shows how to set the minimum and maximum values in the configuration file ./etc/startup.xml.
...
<
Set
name
=
"minBackgroundThreads"
>4</
Set
>
<
Set
name
=
"maxBackgroundThreads"
>10</
Set
>
...
How Thread Queues work
Normally, jobs will be processed in a timely manner, which means that the number of entries in the Thread Queues will be very small (usually even empty). If the entries cannot be processed fast enough, the number of entries increases. If the number exceeds a certain threshold, surplus jobs will be swapped to the hard disk. These jobs are read in again as soon as a second threshold value is undershot. When the system shuts down and there are still jobs in the Thread Queues, these jobs are swapped to the hard disk as well. After a restart, they are then swapped back in the Thread Queues.
If a profile creates a job, a job number is assigned. This (ascending ) job number is unique across all profiles. Subsequently, the obtained data is copied to the backup directory ./datawizard/backup that contains a number of cryptic-looking directories with names ending in 7ff8. Each profile is associated to a directory that contains backup files named Job_ <job number>. Those files are used when you restart a job. You can also manually access them of course. For each job, a file named ENV_ <job number> is also created, which stores the environment variables needed for a restart. Note: See also functions read env-file() and get path of backup-file() and the system variable VAR_SYS_BACKUP.
The queuing works in the following way.
A file is created in the directory ./datawizard/backup/queue/<node name>. "<node name>" stands for "MainIS" or the name of the respective node. See section Load Balancing.
All Thread Queues write to the same directory, there is no distinction by priority. Each Thread Queue entry thus corresponds to a file.
A Thread Queue entry contains the name of the associated profile and a reference to the data to be processed. The data (payload) itself is not included.
The data (payload) for the Thread Queue entries can be found in ./datawizard/backup/queue/payload.
If a Thread Queue entry is a manual restart of a backed up job, it will refer to the payload of the original job.
For priority changes in the profile see section Thread Queue.
Manually restarted jobs do not get the priority of the profile, but priority "Highest (+2)" instead.
When a profile is deleted, associated Thread Queue entries will fail and be deleted.
Standard processing in phase 1
The standard process for phase 1 is to receive data and create a job for the processing profile.
File functions in phase 1
Optional file functions allow you to define conditions for input files for time-driven Input Agents. A job will only be created and the input file will only be processed if the conditions are met. Examples can be found in the documentation for the respective file function classes.
Next job number
The next available job number can be managed by two methods. Using the internal storage service or by executing a stored procedure.
Internal storage service
This is the default option (no configuration necessary).
However, if you want to reduce the number of storage service accesses, you can switch to batching (not possible on load-balancing systems). To do this, set the following system properties.
hub.datawizard.doJobNrBatching=true |
In this example, 100 job numbers are fetched at a time (are managed in the memory). If these job numbers are used up, another 100 are fetched.
Stored procedure
Important note: If the called stored procedure does not return a job number (e.g. because the database "hub" cannot be reached), Lobster Integration is terminated.
A sample procedure for several databases can be found in file ./conf/samples/db_sequence.sql. Store the latest job number in the table used by the procedure to avoid duplicate job numbers. Add the following option in file ./etc/startup.xml, assuming you have used the sample procedure mentioned before.
General:
<
Set
name
=
"sequenceProcedureCall"
>execute command</
Set
> (see example file for your database)
The procedure must be created within the database (schema) Lobster Integration uses for the normal logging.
Important note: If you are using a MySQL database, you should use the following sequence instead, since stored procedures are not thread-safe there.
create
sequence
getNextJobNr start
with
<start_value> increment
by
1;
For the <start_value>, use the currently highest job number (possibly a little higher).
Then add the following option to the configuration file ./etc/startup.xml.
<Set name="sequenceProcedureCall">SELECT NEXT VALUE FOR getNextJobNr</Set>
Redis
Alternatively, an internal Redis database can be used. To do this, add the following option to configuration file ./etc/startup.xml.
You can transfer the last job number to the Redis database by entering the current job number as a single line in configuration file ./etc/jobnr.redis before starting the Integration Server.
<
Set
name
=
"sequenceProcedureCall"
>redis</
Set
>