EncodingPatcherWithRegexReplacement

Configuration file

sample_encoding_patcher_regex.properties

Class name

com.ebd.hub.datawizard.parser.stream.EncodingPatcherWithRegexReplacement

Description


This preparser works like the EncodingPatcher, but additionally allows regex replacements after the execution of the hex replacement statements. In between those steps, the byte stream is interpreted as text, using the profile's encoding. The replacement statements for the byte and text replacements are both defined in the same properties file. Byte replacement statements start with 0x and regex replacement statements start with regex. at the beginning of a line.

Parameters


Parameter

Description

regex.<regular expression>

The replacement string is specified here.

replaceparts

(optional) If true, parts of the line are also replaced. See below. Default: false.

use.crlf

(optional) Value true force the line break to be encoded as CR LF (0x0d0a). See below. Default: false.

check.BOM

(optional) If true, the BOM is observed and the file is recoded accordingly. Default: false . For details see EncodingByBomOrXmlPreParser.

check.XML

(optional) If true, the XML encoding is observed and the file is recoded accordingly . Default: false. For details see EncodingByBomOrXmlPreParser.

Example


regex.^$ = empty_line

This statement replaces every empty line, which does not contain any characters (except line breaks), with the string empty_line.

As opposed to the byte replacement statements, the regex statements are processed in the order they appear in the configuration file. So if the result of a previous regex statement matches a following regex statement, it will be replaced.

The pattern of a regex replacement statement must match the whole line. Parts of a line will only be replaced if the properties file includes the parameter replaceparts=true. This setting will have an effect on all regex statements. With the additional use of ^, for the beginning of a line, and $, for the end of a line, you can create statements, that still only match the entire line, even if replaceparts=true is set.

Note: This preparser works character by character (not byte by byte) and line by line in the second phase (the regex phase). It is not possible to remove or add lines with the text replacements of this preparser, you can only alter the content of lines. The line break in the result file will always be encoded UNIX-compliant as line feed (LF) (0x0a), no matter what the encoding of the line break was in the source file. To force the line break to be encoded as CR LF (0x0d0a), the property use.crlf=true has to be set. The encoding of the line break might be relevant if this preparser is called from a postexecuter or a function or if the backup file is overridden by the preparser.

For regular expressions in Java see http://docs.oracle.com/javase/tutorial/essential/regex/.

Example file


sample_encoding_patcher_regex.properties
#! class = com.ebd.hub.datawizard.parser.stream.EncodingPatcher
# Properties File for formatting EDI and TradaComs
# Encoding: only ISO-8859_1 or ASCII. For UTF8 use with care.
# UNA:+.? '\r\n replace nothing
0x554E413A2B2E3F20270D0A = 0x554E413A2B2E3F20270D0A
# UNA:+.? '\n -> UNA:+.? '\r\n
0x554E413A2B2E3F20270A = 0x554E413A2B2E3F20270D0A
# UNA:+.? 'UN -> UNA:+.? '\r\nUN
0x554E413A2B2E3F2027554E = 0x554E413A2B2E3F20270D0A554E
# ?? replace nothing
0x3F3F = 0x3F3F
# ?' replace nothing
0x3F27 = 0x3F27
# '\r\n replace nothing
0x270D0A = 0x270D0A
# '\n -> '\r\n
0x270A = 0x270D0A
# ' -> '\r\n
0x27 = 0x270D0A
regex.^$ = empty_line
replaceparts = false