TokenStreamSplitter

Group

Preparsers

Class Name

com.ebd.hub.datawizard.parser.stream.TokenStreamSplitter

Function

This preparser is the stream version of the TokenFileSplitter.

Configuration File

sample_splitter.properties

Description

This preparser has the same functionality as the TokenFileSplitter, with the difference that it is a stream preparser, so it is able to process data of any size, without storing all the data in the main memory.

Another difference: The parameter header is allowed to be a path to a text file if it conforms to the syntax read:<URL>. URL can be a local file path file:C:///directory/file.txt, an HTTP URL, or an FTP URL. The whole content of the file (maybe several lines) will be read and inserted as a separating line (rather a separating block) into the file.

Parameters

rows

(mandatory) Number of lines, after which the separating line is added.

header

(mandatory) Separating line to be added.

expression

Regular expression that delays the adding of the separating line until it matches the current line.

eol

Number defining the end of line characters. 0 is interpreted as \n, 1 as \r and all other values as \r\n.

filter

Regular expression that filters the input lines. Lines that do not match the expression are ignored (not output and not counted).

Example File

sample_splitter.properties
#
# sample file for TokenFileSplitter
#
# Supported keys are: rows, header, expression, eol
#
# rows = amount of rows that are combined for one record
# header = line that will be pasted into to indicate a new record
# expression = empty or a reg. expression that must match on current read line to create a new record (beside rows)
# eol=end of line (0=\n, 1 = \r, all other settings will be used for \r\n)
#
rows = 10
header=new!