Splitting of Tokens

Each parsed token of a Cargo-IMP data file can be split in several tokens (or subtoken). In the end, each token/subtoken will correspond with a field in a Lobster_data data source structure. The following syntax is used for the splitting

1. A number (e.g. 2) defines an alpha-numeric substring with the length given by the number itself (here: a substring of length 2).

2. A number followed by an n (e.g. 2n) defines a numeric substring with the length given by the number itself (here: a number with 2 digits).

3. A comma (,) defines a split. If there is no number after a comma the parser will simply form a subtoken out of the remaining characters.

Example:

<token pos="2">2,</token> will split the second token of a data line into 2 subtokens. The first subtoken will consist of the first two characters of token 2 and the second subtoken of the remaining characters of token 2.

4. A hyphen (-) is sometimes used as an additional separator in the input data. It can be specified as a hyphen in the Cargo-IMP XML configuration file and will not be contained in the parsed data.

Example:

Data line ACD/081-22210005 (consisting of 2 tokens) will be parsed into 3 tokens/subtokens (ACD, 081, and 22210005) if the following configuration is used:

<message id="STR" version="2">
<segment id="ACD">
<token pos="2">3,-,</token>
</segment>
...
...
</message>


5. Alphanumeric substrings of dynamic length can be defined with <n>-<m>, where <n> is the minimum length and <m> the maximum length.

Example:
<token pos="2">0-2,5n</token> will split the second token of a data line into 2 tokens. The first subtoken will consist of an alpha-numeric substring of length 0, 1, or 2. The second will be a numeric value with 5 digits. A1234 will result in subtokens A and 1234. The start of the second subtoken will internally be determined by the first occurrence of a digit (here: 1).

6. Numeric substrings of dynamic length can be defined with <r>-<s>n, where <r> is the minimum length and <s> the maximum length.

Example:
<token pos="2">0-2n,5</token> will split the second token of a data line into 2 subtokens. The first subtoken will consist of a numeric value of length 0, 1, or 2. The second subtoken will be an alpha-numeric substring with a length of 5. 12ABC will result in subtokens 12 and ABC. The start of the second token will internally be determined by the first occurrence of a character (here: A).

Important hint (to 5. and 6.): Unfortunately, it will happen that Cargo-IMP message specifications contain a numeric or alpha-numeric data element of dynamic length, followed by another value of the same type. In this case, you will be forced not to split a token containing those two data elements since the parser will not be able to determine the correct start of the second data element. The corresponding source structure in Lobster_data will have to be adjusted accordingly by merging those two fields. Also, see chapter Dealing with Dynamic Length Fields and Optional or Conditional Fields and Subsegments for this matter.