Group	Preparsers
Function	This preparser is used to extract information from a PDF document.
Configuration file	PDF2Data.properties

Description

This preparser is used to extract information from a PDF document and create a JSON file.

The document is sent to the machine learning service of our partner contract.fit via HTTPS. Text is extracted by means of Optical Character Recognition (OCR).

Access to our partner is subject to a fee. After purchase, the access will be configured for you by us. If you are interested, please contact our support or sales staff.

The configuration is done in a properties file, in which the following parameters can be defined.

Parameter

Parameter	Description
Synchronous	Specifies whether the service is called synchronously (true) or asynchronously (false). Default: false. Important note: When using asynchronous calls, Lobster_data must be accessible via HTTPS from the outside.
ChannelID	Channel ID of an HTTPS channel with Basic Authentication (Preemptive Authentication).
useDMZ	Specifies whether the service is called via DMZ. Default: false.
URL	URL of contract.fit system including Inbox ID (see example). Note: Each document type (e.g. invoice, order, etc.) is defined as a single inbox on the contract.fit platform. The structure of the response JSON file (fields) depends on the inbox ID .

Example File

PDF2Data.properties

Synchronous=true
ChannelID=1599728356339212
useDMZ=false
URL=https://lobster.contract-q.fit/admin/documents/5e7a390a3b08c6d23ab8b8c4

Note: The value 5e7a390a3b08c6d23ab8b8c4 is the Inbox ID.

Creating Source Structure

To create a matching source structure for the respective JSON file, the following procedure can be used.

Create profile with setting No mapping.
Configure preparser.
Set checkbox Result of preparser overrides backup file.

After a profile run, the backup file of the job can be used via the source structure menu entry Create structure from file analysis.