Splitting of tokens (Cargo-IMP)
Each parsed token of a Cargo-IMP data file can be split in several tokens (or subtoken). In the end, each token/subtoken will correspond with a field in a source structure. The following syntax is used for the splitting
1. A number (e.g. 2) defines an alpha-numeric substring with the length given by the number itself (here: a substring of length 2).
2. A number followed by an n (e.g. 2n) defines a numeric substring with the length given by the number itself (here: a number with 2 digits).
3. A comma (,) defines a split. If there is no number after a comma the parser will simply form a subtoken out of the remaining characters.
Example:
<token pos="2">2,</token> will split the second token of a data line into 2 subtokens. The first subtoken will consist of the first two characters of token 2 and the second subtoken of the remaining characters of token 2.
4. A hyphen (-) is sometimes used as an additional separator in the input data. It can be specified as a hyphen in the Cargo-IMP XML configuration file and will not be contained in the parsed data.
Example:
Data line ACD/081-22210005 (consisting of 2 tokens) will be parsed into 3 tokens/subtokens (ACD, 081, and 22210005) if the following configuration is used:
<
message
id
=
"STR"
version
=
"2"
>
<
segment
id
=
"ACD"
>
<
token
pos
=
"2"
>3,-,</
token
>
</
segment
>
...
...
</
message
>
5. Alphanumeric substrings of dynamic length can be defined with <n>-<m>, where <n> is the minimum length and <m> the maximum length.
Example:
<token pos="2">0-2,5n</token> will split the second token of a data line into 2 tokens. The first subtoken will consist of an alpha-numeric substring of length 0, 1, or 2. The second will be a numeric value with 5 digits. A1234 will result in subtokens A and 1234. The start of the second subtoken will internally be determined by the first occurrence of a digit (here: 1).
6. Numeric substrings of dynamic length can be defined with <r>-<s>n, where <r> is the minimum length and <s> the maximum length.
Example:
<token pos="2">0-2n,5</token> will split the second token of a data line into 2 subtokens. The first subtoken will consist of a numeric value of length 0, 1, or 2. The second subtoken will be an alpha-numeric substring with a length of 5. 12ABC will result in subtokens 12 and ABC. The start of the second token will internally be determined by the first occurrence of a character (here: A).
Important note (to 5. and 6.): Unfortunately, it will happen that Cargo-IMP message specifications contain a numeric or alpha-numeric data element of dynamic length, followed by another value of the same type. In this case, you will be forced not to split a token containing those two data elements since the parser will not be able to determine the correct start of the second data element. The corresponding source structure will have to be adjusted accordingly by merging those two fields. Also, see chapter Dynamic length fields, optional/conditional fields, sub segments (CARGO-IMP) for this matter.