TextPreParser
|
Configuration file |
TextPreParser.xml |
|
Class name |
com.ebd.hub.datawizard.parser.TextPreParser |
Description
Very similar to the PDFPreParser, this preparser serves to extract a few details from an unstructured text file that would otherwise be difficult or impossible to extract with a 'normal' profile. For technical details, see the PDFPreParser documentation and the example below. The root element is TextPreParser instead of PDFPreParser.
Example
Assume the following, somewhat scattered input file.
************************************************************************************************Note.............:Load. Unit.......: S90 2 Swap bodies 7.45Booked GM3.......: 80 Muster contact: Max MusterWeight KG........: 24000 Tel.........:Ord. LM..........: 0 Fax.........: max@muster.comLoading: 180516 20:30 1 of 1 Unloading: 180517 04:00------------------------------------------------------------------------------------------------053-DT-1 147-STO-1Max Muster AG Muster-Straße 1 Büttnerstraße 2130165 Hannover 30165 HannoverDE GERMANY DE GERMANY+4918049200 Fax:+491804926600 +4918044240 Fax:+491803424232Sender Ref No:Consignment GM3 Euro/Muster/Half Loading unit ID Pack Receiver------------------------------------------------------------------------------------------------053-DT-180514147546 46.6 0 0 0------------------------------------------------------------------------------------------------================================================================================================B/L No...........: Car Ref:Shipment ID......: 003-DSO-S3543110 (Shipment ID to be entered on the freight invoice)INCOTERMS........: DDU CONSIGNEETransport Agreement Reference: 2125-COM-807-30Agreed PriceFREI 560 EUR--------------------------Seller...........: Example GmbH Hauptstraße 21 GERMANY 30165 Hannover 4951167222202 Fax: 51167491222Buyer on Invoice.: Muster AG Musterstraße 15 GERMANY 30165 Hannover +49421588535600 Fax: +41588535601Invoice Receiver.: Muster AG Musterstraße 15 GERMANY 30165 Hannover +49147681000 Fax: +492347615732*** END OF DOCUMENT ***To get this data under control, we use the following configuration file for the preparser in a profile named TextPreParserProfile.
<?xml version="1.0" encoding="UTF-8"?><TextPreParser> <Profile> <Name>TextPreParserProfile</Name> <LineFrom>1</LineFrom> <LineTo>100</LineTo> <Tag> <Name>LoadUnit</Name> <BeginsAfter>Load. Unit.......:</BeginsAfter> <Words>1</Words> </Tag> <Tag> <Name>SwapBodies</Name> <BeginsAfter>2 Swap bodies</BeginsAfter> <Words>1</Words> </Tag> <Tag> <Name>ShipmentID</Name> <BeginsAfter>Shipment ID......:</BeginsAfter> <Words>1</Words> </Tag> <Tag> <Name>Address</Name> <BeginsAfter>053-DT-1 </BeginsAfter> <EndsBefore>Sender Ref No:</EndsBefore> </Tag> <Tag> <Name>Name</Name> <LinesAfter Tag="Address">1</LinesAfter> </Tag> <Tag> <Name>Street</Name> <LinesAfter Tag="Address">2</LinesAfter> </Tag> <Tag> <Name>City</Name> <LinesAfter Tag="Address">3</LinesAfter> </Tag> <Tag> <Name>Country</Name> <LinesAfter Tag="Address">4</LinesAfter> </Tag> <Tag> <Name>AdressID</Name> <BeginsAfter>053-DT-</BeginsAfter> <Words>1</Words> </Tag> </Profile></TextPreParser>Download
You can download and import the profile TextPreParserProfile.pak as an example. Of course, the extracted data can be refined to your liking in the target structure.
TextPreParser.xml (already integrated in the profile, but here again explicitly).
input.txt (already integrated in the profile as test data, but here again explicitly for manual upload).