PPRuNe Forums - View Single Post - Converting PDF to Excel (XLS) data file
View Single Post
Old 14th Feb 2013, 08:49
  #8 (permalink)  
cattletruck
 
Join Date: Apr 1998
Location: Mesopotamos
Posts: 5
Likes: 0
Received 0 Likes on 0 Posts
I did something similar for someone who received PDF data files from guvmint once a month and wanted it loaded into Excel for a mail merge. Initially he just wanted to extract the data from the PDF file, what he got after a few iterations was a menu system with 5 steps that did the whole shebang for him. He still calls me once or twice a year to thank me for making such a frustrating and inefficient activity into a sequence of easy menu options.

This is what I did before I added the smarts like file selection and auto launching mail merge.

1) Convert the PDF file into XML format.
pdftoxml.exe -noImage -noImageInline step1.pdf step1.xml

2) Clean up the data. I used a stylesheet to just select the xml elements I was interested in and filter out the stuff I wasn't.
xsltproc -o step2.xml step2script.xslt step1.xml

3) Run Excel with the xml file as an argument.
excel.exe step2.xml

The files pdftoxml and xsltproc are freeware and easy to find on the internet.

HOWEVER: The hard bit was writing the XSLT stylesheet, in my case the PDF data was structured in a complicated way meaning the resulting XML file was just as bad (and being guvmint I had inconsistent data fields that needed attention). Google XSLT Stylesheets and see if you are comfortable with learning it before embarking down this path.
cattletruck is offline