Splitting of Tokens
Each parsed token of a Cargo-IMP data file can be split in several tokens (or subtoken). In the end, each token/subtoken will correspond with a field in a Lobster_data data source structure. The following syntax is used for the splitting
1. A number (e.g. 2) defines an alpha-numeric substring with the length given by the number itself (here: a substring of length 2).
2. A number followed by an n (e.g. 2n) defines a numeric substring with the length given by the number itself (here: a number with 2 digits).
3. A comma (,) defines a split. If there is no number after a comma the parser will simply form a subtoken out of the remaining characters.
Example:
<token pos="2">2,</token> will split the second token of a data line into 2 subtokens. The first subtoken will consist of the first two characters of token 2 and the second subtoken of the remaining characters of token 2.
4. A hyphen (-) is sometimes used as an additional separator in the input data. It can be specified as a hyphen in the Cargo-IMP XML configuration file and will not be contained in the parsed data.
Example:
Data line ACD/081-22210005 (consisting of 2 tokens) will be parsed into 3 tokens/subtokens (ACD, 081, and 22210005) if the following configuration is used:
<
message
id
=
"STR"
version
=
"2"
>
<
segment
id
=
"ACD"
>
<
token
pos
=
"2"
>3,-,</
token
>
</
segment
>
...
...
</
message
>
5. Alphanumeric substrings of dynamic length can be defined with <n>-<m>, where <n> is the minimum length and <m> the maximum length.
Example:
<token pos="2">0-2,5n</token> will split the second token of a data line into 2 tokens. The first subtoken will consist of an alpha-numeric substring of length 0, 1, or 2. The second will be a numeric value with 5 digits. A1234 will result in subtokens A and 1234. The start of the second subtoken will internally be determined by the first occurrence of a digit (here: 1).
6. Numeric substrings of dynamic length can be defined with <r>-<s>n, where <r> is the minimum length and <s> the maximum length.
Example:
<token pos="2">0-2n,5</token> will split the second token of a data line into 2 subtokens. The first subtoken will consist of a numeric value of length 0, 1, or 2. The second subtoken will be an alpha-numeric substring with a length of 5. 12ABC will result in subtokens 12 and ABC. The start of the second token will internally be determined by the first occurrence of a character (here: A).
Important note (to 5. and 6.): Unfortunately, it will happen that Cargo-IMP message specifications contain a numeric or alpha-numeric data element of dynamic length, followed by another value of the same type. In this case, you will be forced not to split a token containing those two data elements since the parser will not be able to determine the correct start of the second data element. The corresponding source structure in Lobster_data will have to be adjusted accordingly by merging those two fields. Also, see chapter Dealing with Dynamic Length Fields and Optional or Conditional Fields and Subsegments for this matter.