Semantic check
The data inserted by the parser into the source structure (source tree) can be subjected to a semantic check, i.e. you can check whether certain fields occur and have a certain value. If the check is not successful, the profile terminates with an error.
Configuration
In order for the parser to perform this semantic check, the following settings must be made.
Set checkbox Semantic check in Phase 2/Document type. Alternatively, you can create the system variable VAR_SYS_ENABLE_SEMANTIC_CHECK of type Boolean with value true.
Uncheck checkbox Ignore empty source fields in Phase 2/Extended.
In addition, a suitable rule file must be found. See the following section.
Use the normal data mapper.
Rule files
The rules for the semantic check are in rule files. When the Integration Server is started, all files with names rule_*.xml in directory ./etc/admin/datawizard/semantic/ are read. A rule is used in the profile if the type and format attributes match (see examples).
There are several ways to modify and reload the rule files at runtime.
The simplest option is available in section Rule definitions (administration).
You can also modify rule files in the File manager. In order for these to be reloaded, the class com.ebd.hub.datawizard.parser.semantic.SemanticRuleManager must be executed in the Class execution section of the Admin Console.
In addition, you can also trigger the reload via REST API, see section Rule files for semantic check (REST API) .
You can use function replicate file() in a load balancing system.
If you want to check whether the reloading worked, you can do this in the Server Logs (→ internal → message.log).
Examples
Let's assume we have a profile with document type CSV and a simple source structure with a node node and below it a field field. In principle, the examples also apply to all other allowed formats and more complicated structures.
The corresponding settings for the semantic check are set.
In addition, we have stored and loaded the rule file rule_CSV.xml.
Document type
All of the following examples can be applied to all other document types as well. Allowed document types: CSV, EXCEL, FIXRECORD, DB, EDIFACT, IDOC, X12, BWA, JSON, CARGOIMP, XML. The XML V4 parser is not supported. A rule is used if the profile has the document type specified in attribute format. Note: The format can be overwritten with system variable VAR_SYS_SEMANTIC_FORMAT.
Rule type
If you specify value common in attribute type in a rule, this rule always applies in the profile (regardless of the setting in the profile). If you enter any other value instead of common, you must specify this value in field Source structure (above the source structure) in the profile so that this rule applies. Note: The type can be overwritten with system variable VAR_SYS_SEMANTIC_TYPE.
Mandatory field value
<
rules
type
=
"common"
format
=
"csv"
>
<
rule
segment
=
"/node"
>
<
value
mandatory
=
"true"
field
=
"field"
>somevalue</
value
>
</
rule
>
</
rules
>
In the semantic check, after parsing the input file of the profile, it is now checked whether the field field has the value somevalue at least once ( mandatory=true ) . If the check takes place, you will find the log line ...semantic check... (trace messages for phase 2 must be set).
Notes:
Please note that the nodes (attribute segment) must be specified in XPath notation.
You can use another attribute regex=true to specify the mandatory value as a regular expression.
If mandatory is set to false here, it has no effect. It does not mean that the value must not occur, but if it occurs, it must have the defined value.
For the EDIFACT and X12 format the field field is also searched in the component.
Unique field value
A field value can occur more than once. We can check if a field value is unique and produce an error if the same field value occurs multiple times. To do this, we use the unique attribute.
<
rules
type
=
"common"
format
=
"csv"
>
<
rule
segment
=
"/node"
>
<
value
unique
=
"true"
field
=
"field"
>somevalue</
value
>
</
rule
>
</
rules
>
In our example, this works without any problems because all the nodes created have the same parent node (the root node):
However, if we use a slightly more complex source structure, this is no longer the case and there are several different parent nodes:
In this case, we have to slightly adjust our rule file (aside from the different node path) with the attribute cluster=true so that all parent nodes are grouped together.
<
rules
type
=
"common"
format
=
"csv"
>
<
rule
segment
=
"/node/sub"
>
<
value
unique
=
"true"
cluster
=
"true"
field
=
"field"
>somevalue</
value
>
</
rule
>
</
rules
>
Allowed field values
It is also possible to create a list of allowed field values. To do this, use the ref attribute and the definition element, as in the following file.
<
rules
type
=
"common"
format
=
"csv"
>
<
rule
segment
=
"/node"
>
<
value
ref
=
"myref"
mandatory
=
"true"
unique
=
"true"
field
=
"field"
>somevalue</
value
>
</
rule
>
<
definition
id
=
"myref"
>
<
value
>somevalue</
value
>
<
value
>somevalue2</
value
>
</
definition
>
</
rules
>
So here the value somevalue must occur, but it may occur only once and furthermore only the field values somevalue and somevalue2 are allowed. You can also set mandatory to false and remove the value somevalue, then only the allowed values are checked.
In addition, a further variant is possible. Here it can be specified for each allowed value whether it is mandatory and a wildcard * can be used to allow all values. In the following example the mandatory value somevalue1 and somevalue2 must occur, but all values may occur.
<
rules
type
=
"common"
format
=
"csv"
>
<
rule
segment
=
"/node"
>
<
value
ref
=
"myref"
regex
=
"false"
unique
=
"false"
mandatory
=
"false"
field
=
"field"
></
value
>
</
rule
>
<
definition
id
=
"myref"
>
<
value
mandatory
=
"true"
>somevalue1</
value
>
<
value
mandatory
=
"true"
>somevalue2</
value
>
<
value
>*</
value
>
</
definition
>
</
rules
>
As an alternative to defining the allowed values directly in the rules file, the allowed values can also be swapped out in another file ./etc/admin/datawizard/semantic/common_values.xml. The ref attribute in the rule file then remains the same, but the definition element is omitted. The definition of the allowed values in the rule file itself always takes precedence.
<
allowed_values
>
<
definition
id
=
"myref"
>
<
value
>somevalue</
value
>
<
value
>somevalue2</
value
>
</
definition
>
</
allowed_values
>
Normalization
Node and field names must be unique. To ensure this, suffixes are used in the names. Suppose our source structure had a node node-1 and a field field-1 instead of a node node and a field field. We could still use the same rule file if we added an additional attribute normalized=true. Here is a corresponding modification of our first example.
<
rules
type
=
"common"
format
=
"csv"
>
<
rule
segment
=
"/node"
normalized="true>
<
value
mandatory
=
"true"
field
=
"field"
>somevalue</
value
>
</
rule
>
</
rules
>