TokenStreamSplitter

Configuration file

./conf/samples/sample_splitter.properties

Class name

com.ebd.hub.datawizard.parser.stream.TokenStreamSplitter

Description


This preparser has the same functionality as the TokenFileSplitter, with the difference that it is a stream preparser, so it is able to process data of any size, without storing all the data in the main memory.

Another difference: The parameter header is allowed to be a path to a text file if it conforms to the syntax read:<URL>. URL can be a local file path file:C:///directory/file.txt, an HTTP URL, or an FTP URL. The whole content of the file (maybe several lines) will be read and inserted as a separating line (rather a separating block) into the file.

Parameters


Parameter

Description

rows

Number of lines, after which the separating line is added.

header

Separating line to be added.

expression

(optional) Regular expression that delays the adding of the separating line until it matches the current line.

eol

(optional) Number defining the end of line characters. 0 is interpreted as \n, 1 as \r and all other values as \r\n.

filter

(optional) Regular expression that filters the input lines. Lines that do not match the expression are ignored (not output and not counted).

check.BOM

(optional) If true, the BOM is observed and the file is recoded accordingly. Default: false . For details see EncodingByBomOrXmlPreParser.

check.XML

(optional) If true, the XML encoding is observed and the file is recoded accordingly . Default: false. For details see EncodingByBomOrXmlPreParser.

Example file


sample_splitter.properties
rows = 10
header=new!