ExtractFileFromPDF


Group

Preparsers

Class Name

com.ebd.hub.datawizard.parser.ExtractFileFromPDF

Function

Is able to extract any contained file from a PDF/A file.

Configuration File

None necessary. Configuration is done directly with a string in field Configuration file.

Description

This preparser is able to extract any contained file from a PDF/A file (apart from the file ZUGFeRD-invoice.xml).

Field Configuration file is used to specify the list of possible file names (separated by semicolons). Only the first found file is returned. See the following example.

Example

Assume the following value of the parameter string.

MyInvoice.txt;Orders.csv

If both files are contained in the PDF/A, file MyInvoice.txt will be extracted if it was added to the PDF first.


Important note: When you open a PDF/A file in a viewer, it does not always display the real file names of the attached files. Let's assume you see the file name abadoc.xml in the viewer and specify it in the parameter string. If the actual file name is different, you will receive an error message of the following type.

[unknown] No valid embedded file found but these are included: 'AbaDoc', 'ZUGFeRD-invoice.xml'
[ExtractFileFromPDF] Exception in PreParser: java.lang.Exception: Invalid PDF/A format - unable to extract file

In that case, use file name AbaDoc instead.