TokenStreamSplitter
Group |
|
Class Name |
com.ebd.hub.datawizard.parser.stream.TokenStreamSplitter |
Function |
This preparser is the stream version of the TokenFileSplitter. |
Configuration File |
sample_splitter.properties |
Description
This preparser has the same functionality as the TokenFileSplitter, with the difference that it is a stream preparser, so it is able to process data of any size, without storing all the data in the main memory.
Another difference: The parameter header is allowed to be a path to a text file if it conforms to the syntax read:<URL>. URL can be a local file path file:C:///directory/file.txt, an HTTP URL, or an FTP URL. The whole content of the file (maybe several lines) will be read and inserted as a separating line (rather a separating block) into the file.
Parameters
rows |
(mandatory) Number of lines, after which the separating line is added. |
header |
(mandatory) Separating line to be added. |
expression |
Regular expression that delays the adding of the separating line until it matches the current line. |
eol |
Number defining the end of line characters. 0 is interpreted as \n, 1 as \r and all other values as \r\n. |
filter |
Regular expression that filters the input lines. Lines that do not match the expression are ignored (not output and not counted). |
Example File
#
# sample file for TokenFileSplitter
#
# Supported keys are: rows, header, expression, eol
#
# rows = amount of rows that are combined for one record
# header = line that will be pasted into to indicate a new record
# expression = empty or a reg. expression that must match on current read line to create a new record (beside rows)
# eol=end of line (0=\n, 1 = \r, all other settings will be used for \r\n)
#
rows = 10
header=new!