PPRuNe Forums - View Single Post - extracting text from a PDF document
View Single Post
Old 25th October 2013 | 10:01
  #13 (permalink)  
cattletruck
 
Joined: Apr 1998
Posts: 4
Likes: 1
From: Mesopotamos
I've noticed PDFs saved as an image optimised for the internet rather than for printing produces very blurry text. I doubt any free OCR would have a chance of reading that. I also got a feeling that the same OCRs get thrown off course if there are also pictures present in the text.

If that is the case then the text in the image needs to be sharpened up and the pictures removed before passing it over to the OCR to do its thing. You can screen capture a PDF page and edit it in Gimp/Photoshop - sadly one page at a time.

Once you have the PDF page saved in image format then there are more free OCR tools to choose from. Just be aware that all text formatting will be lost, and the OCR converter is at best about 95% right.
cattletruck is offline  
Reply