JSON
This document type is used together with the Input Agent "DataCockpit/Portal", but can also be used independently.
Settings
(1) Check input format: If this checkbox is set, the input fields are checked against their format templates during parsing. If a value violates the format template of the field or exceeds the field length, an error is created. If an error occurs, the profile job will not be aborted immediately in phase 2, but at the end of phase 2 or after 50 errors. Attention: The use of format checking puts a strain on performance and should only be used if absolutely necessary.
(2) Check min/max settings: Specifies whether to check the number of repetitions (of fields and nodes) in the source structure.
(3) Execute semantic checks: Incoming files can be checked with semantic rules. See section Semantic check .
(4) Force single record: Indicates whether the data should be summarised in a single record. Setting this checkbox prevents the parser from creating multiple records.
(5) JSON Lines format: See section below.
(6) Split files: See section below.
JSON Lines format
This option allows reading in files in the JSON lines format.
The chunk size decides how many lines end up in a record. With values 0 and 1, each record has one line, for values >1 as many lines as the chunk size.
A matching source structure can be created using option "Create structure from file analysis" in the source structure menu. It is important to note that the correct chunk size needs to be specified here, since a different structure is created for a chunk size >1 and parsing only works with the appropriate structure. See section Analyse file for structure.
Splitting files
If you receive a very large JSON file that contains, for example, a very high number of shipments, you have the option to split this file into smaller pieces. We assume the following file.
{
"Envelope"
: {
"header"
: {
"senderId"
:
"317"
},
"content"
: {
"shipments"
: [
{
"consignorRef"
:
"D83444679"
,
"forwarderRef"
:
"D83444679"
},
{
"consignorRef"
:
"CCV231101734"
,
"forwarderRef"
:
"CCV231101734"
},
{
"consignorRef"
:
"TY281403"
,
"forwarderRef"
:
"TY281403"
}
]
}
}
}
In practice, of course, this file will be much larger and more complex, but for the sake of illustration, let's use this simplified file. We see the file contains 3 shipments. We now want to split the file into smaller files that contain a maximum of 2 shipments. For this we can use the following settings.
Settings
(7) Splitting text: See following text. Example: {"consignorRef":
(8) Chunk size: See following text. Example: 2
(9) Remove last comma: See following text. Example: true
(10) Copy header to chunk: See following text. Example: true
(11) JSON-prefix to insert: See following text.
(12) JSON-suffix to append: See following text. Example: ]}}}
First we have to specify how a shipment (i.e. a chunk) is identified in (7). In our case this is the string {"consignorRef":. Now we have to specify in (8) how many chunks (> 0) our split files have. So we want to have split files with a maximum of two shipments. Note that these split files are only created internally for the parser.
With (9) you can remove the comma that is, for example, between the second and third shipment in the file to be split. If you would not remove this comma, no correct JSON would be generated in the split files. Important note: The JSON text is normalised, i.e. all spaces and line breaks are removed, which must then also be considered in the splitting text (7).
If you want to copy the header of the original file into the split files, you can set (10) to "true". The header here are all characters up to the occurrence of the first chunk. This way you achieve that the created split files have exactly the same structure as the original file, but with a limited number of shipments in this case.
With (11) you can specify a string that will be placed in front of the cut chunks. This may be necessary to produce a correct JSON format. With (12) you can insert such a string after the cut chunks. This is necessary in our example, because there the header is inserted first and then the individual chunks. Afterwards, however, we have to close all brackets that were opened in the header and not closed, so in our case ]}}}.
As a result, we would now get two internal split files (can be seen in the log) that structurally look like the original file. So you don't have to change the source structure in phase 3. However, the first split file contains only two shipments and the second split file contains only one shipment. The parser now generates a separate record from each split file. Important note: Please note that in our example, the top level node (Envelope) in the source structure represents an object, which is why a record is generated for each occurrence of this envelope data structure. If the top level node represented an array, the splitting would also be successful, but it would be rendered ineffective again during parsing, as a record would be generated for each element within the array. See section When does the parser start a new record?
Example:
{
"shipments"
:[
{
"consignorRef"
:
"D83444679"
,
"forwarderRef"
:
"D83444679"
},
{
"consignorRef"
:
"CCV231101734"
,
"forwarderRef"
:
"CCV231101734"
},
{
"consignorRef"
:
"TY281403"
,
"forwarderRef"
:
"TY281403"
}
]
}
You would also copy the header here, but you would additionally have to use the JSON-prefix to insert parameter with the value {"Envelope": to put an object around your split. You would then use the value ]}} in the JSON-suffix to append parameter. So ]} to close the header and } to close the inserted prefix. In addition, the source structure must of course be manually adapted accordingly (i.e. a top level node Envelope with corresponding match code).
Example profiles:
Merging files
Files cannot be merged directly, but in the add-on module Content Inspection. See section Merge JSON.