TextPreParser

Configuration file

TextPreParser.xml

Class name

com.ebd.hub.datawizard.parser.TextPreParser


Description


Very similar to the PDFPreParser, this preparser serves to extract a few details from an unstructured text file that would otherwise be difficult or impossible to extract with a 'normal' profile. For technical details, see the PDFPreParser documentation and the example below. The root element is TextPreParser instead of PDFPreParser.

Example


Assume the following, somewhat scattered input file.


input.txt
************************************************************************************************
Note.............:
Load. Unit.......: S90 2 Swap bodies 7.45
Booked GM3.......: 80 Muster contact: Max Muster
Weight KG........: 24000 Tel.........:
Ord. LM..........: 0 Fax.........: max@muster.com
Loading: 180516 20:30 1 of 1 Unloading: 180517 04:00
------------------------------------------------------------------------------------------------
053-DT-1 147-STO-1
Max Muster AG
Muster-Straße 1 Büttnerstraße 21
30165 Hannover 30165 Hannover
DE GERMANY DE GERMANY
+4918049200 Fax:+491804926600 +4918044240 Fax:+491803424232
Sender Ref No:
Consignment GM3 Euro/Muster/Half Loading unit ID Pack Receiver
------------------------------------------------------------------------------------------------
053-DT-180514147546 46.6 0 0 0
------------------------------------------------------------------------------------------------
================================================================================================
B/L No...........: Car Ref:
Shipment ID......: 003-DSO-S3543110 (Shipment ID to be entered on the freight invoice)
INCOTERMS........: DDU CONSIGNEE
Transport Agreement Reference: 2125-COM-807-30
Agreed Price
FREI 560 EUR
--------------------------
Seller...........: Example GmbH Hauptstraße 21 GERMANY
30165 Hannover 4951167222202 Fax: 51167491222
Buyer on Invoice.: Muster AG Musterstraße 15 GERMANY
30165 Hannover +49421588535600 Fax: +41588535601
Invoice Receiver.: Muster AG Musterstraße 15 GERMANY
30165 Hannover +49147681000 Fax: +492347615732
*** END OF DOCUMENT ***


To get this data under control, we use the following configuration file for the preparser in a profile named TextPreParserProfile.


TextPreParser.xml
<?xml version="1.0" encoding="UTF-8"?>
<TextPreParser>
<Profile>
<Name>TextPreParserProfile</Name>
<LineFrom>1</LineFrom>
<LineTo>100</LineTo>
<Tag>
<Name>LoadUnit</Name>
<BeginsAfter>Load. Unit.......:</BeginsAfter>
<Words>1</Words>
</Tag>
<Tag>
<Name>SwapBodies</Name>
<BeginsAfter>2 Swap bodies</BeginsAfter>
<Words>1</Words>
</Tag>
<Tag>
<Name>ShipmentID</Name>
<BeginsAfter>Shipment ID......:</BeginsAfter>
<Words>1</Words>
</Tag>
<Tag>
<Name>Address</Name>
<BeginsAfter>053-DT-1 </BeginsAfter>
<EndsBefore>Sender Ref No:</EndsBefore>
</Tag>
<Tag>
<Name>Name</Name>
<LinesAfter Tag="Address">1</LinesAfter>
</Tag>
<Tag>
<Name>Street</Name>
<LinesAfter Tag="Address">2</LinesAfter>
</Tag>
<Tag>
<Name>City</Name>
<LinesAfter Tag="Address">3</LinesAfter>
</Tag>
<Tag>
<Name>Country</Name>
<LinesAfter Tag="Address">4</LinesAfter>
</Tag>
<Tag>
<Name>AdressID</Name>
<BeginsAfter>053-DT-</BeginsAfter>
<Words>1</Words>
</Tag>
</Profile>
</TextPreParser>

Download


You can download and import the profile TextPreParserProfile.pak as an example. Of course, the extracted data can be refined to your liking in the target structure.


  • TextPreParser.xml (already integrated in the profile, but here again explicitly).

  • input.txt (already integrated in the profile as test data, but here again explicitly for manual upload).