TextPreParser
Configuration file |
TextPreParser.xml |
Class name |
com.ebd.hub.datawizard.parser.TextPreParser |
Description
Very similar to the PDFPreParser, this preparser serves to extract a few details from an unstructured text file that would otherwise be difficult or impossible to extract with a 'normal' profile. For technical details, see the PDFPreParser documentation and the example below. The root element is TextPreParser instead of PDFPreParser.
Example
Assume the following, somewhat scattered input file.
************************************************************************************************
Note.............:
Load. Unit.......: S90 2 Swap bodies 7.45
Booked GM3.......: 80 Muster contact: Max Muster
Weight KG........: 24000 Tel.........:
Ord. LM..........: 0 Fax.........: max@muster.com
Loading: 180516 20:30 1 of 1 Unloading: 180517 04:00
------------------------------------------------------------------------------------------------
053-DT-1 147-STO-1
Max Muster AG
Muster-Straße 1 Büttnerstraße 21
30165 Hannover 30165 Hannover
DE GERMANY DE GERMANY
+4918049200 Fax:+491804926600 +4918044240 Fax:+491803424232
Sender Ref No:
Consignment GM3 Euro/Muster/Half Loading unit ID Pack Receiver
------------------------------------------------------------------------------------------------
053-DT-180514147546 46.6 0 0 0
------------------------------------------------------------------------------------------------
================================================================================================
B/L No...........: Car Ref:
Shipment ID......: 003-DSO-S3543110 (Shipment ID to be entered on the freight invoice)
INCOTERMS........: DDU CONSIGNEE
Transport Agreement Reference: 2125-COM-807-30
Agreed Price
FREI 560 EUR
--------------------------
Seller...........: Example GmbH Hauptstraße 21 GERMANY
30165 Hannover 4951167222202 Fax: 51167491222
Buyer on Invoice.: Muster AG Musterstraße 15 GERMANY
30165 Hannover +49421588535600 Fax: +41588535601
Invoice Receiver.: Muster AG Musterstraße 15 GERMANY
30165 Hannover +49147681000 Fax: +492347615732
*** END OF DOCUMENT ***
To get this data under control, we use the following configuration file for the preparser in a profile named TextPreParserProfile.
<?
xml
version
=
"1.0"
encoding
=
"UTF-8"
?>
<
TextPreParser
>
<
Profile
>
<
Name
>TextPreParserProfile</
Name
>
<
LineFrom
>1</
LineFrom
>
<
LineTo
>100</
LineTo
>
<
Tag
>
<
Name
>LoadUnit</
Name
>
<
BeginsAfter
>Load. Unit.......:</
BeginsAfter
>
<
Words
>1</
Words
>
</
Tag
>
<
Tag
>
<
Name
>SwapBodies</
Name
>
<
BeginsAfter
>2 Swap bodies</
BeginsAfter
>
<
Words
>1</
Words
>
</
Tag
>
<
Tag
>
<
Name
>ShipmentID</
Name
>
<
BeginsAfter
>Shipment ID......:</
BeginsAfter
>
<
Words
>1</
Words
>
</
Tag
>
<
Tag
>
<
Name
>Address</
Name
>
<
BeginsAfter
>053-DT-1 </
BeginsAfter
>
<
EndsBefore
>Sender Ref No:</
EndsBefore
>
</
Tag
>
<
Tag
>
<
Name
>Name</
Name
>
<
LinesAfter
Tag
=
"Address"
>1</
LinesAfter
>
</
Tag
>
<
Tag
>
<
Name
>Street</
Name
>
<
LinesAfter
Tag
=
"Address"
>2</
LinesAfter
>
</
Tag
>
<
Tag
>
<
Name
>City</
Name
>
<
LinesAfter
Tag
=
"Address"
>3</
LinesAfter
>
</
Tag
>
<
Tag
>
<
Name
>Country</
Name
>
<
LinesAfter
Tag
=
"Address"
>4</
LinesAfter
>
</
Tag
>
<
Tag
>
<
Name
>AdressID</
Name
>
<
BeginsAfter
>053-DT-</
BeginsAfter
>
<
Words
>1</
Words
>
</
Tag
>
</
Profile
>
</
TextPreParser
>
Download
You can download and import the profile TextPreParserProfile.pak as an example. Of course, the extracted data can be refined to your liking in the target structure.
TextPreParser.xml (already integrated in the profile, but here again explicitly).
input.txt (already integrated in the profile as test data, but here again explicitly for manual upload).