UnicodeToASCIIPreparser

Group

Preparsers

Class Name

com.ebd.hub.datawizard.parser.UnicodeToASCIIPreparser

Function

This preparser converts Unicode data into ASCII data by replacing, or removing non-ASCII characters.

Configuration File

sample_UnicodeToASCIIPreparser.properties

Description

This preparser converts Unicode data into ASCII data by replacing or removing non-ASCII characters. It expects the path to a properties file with 2 parameters for configuration.

conversiontype

(replace or remove) If replace, non-ASCII characters are converted into their corresponding lower ASCII characters. Characters without corresponding lower ASCII character are removed. If remove, non-ASCII characters are removed.

upperlimit

(optional) Value (decimal byte value of the encoded character) to define the start of non-ASCII characters (lower ASCII characters are from 0 to 127). Default: 128

Example

sample_UnicodeToASCIIPreparser.properties
conversiontype=replace
upperlimit=256

Concrete examples for above and further configurations.

Input Data

conversiontype

upperlimit

Result

Schönstraße costs 1 million €

replace

Schonstrasse costs 1 million

Schönstraße costs 1 million €

replace

Schonstrasse costs 1 million

Schönstraße costs 1 million €

replace

128

Schonstrasse costs 1 million

Schönstraße costs 1 million €

replace

256

Schönstraße costs 1 million

Schönstraße costs 1 million €

replace

65536

Schönstraße costs 1 million €

Schönstraße costs 1 million €

remove

128

Schnstrae costs 1 million