Making a book.
I have a book the pages of which exist on my pc as jpegs of the original printed book pages. Would anybody be able to suggest something to convert the jpegs into an e-book?
|
messy
and success depends on the quality of those images. In a nutshell you are going to have to OCR each page image and convert in into text. Each page will need correcting as OCR technology is not 100% - by a long way You then need to paste the corrected text files into one large text file, and then output it - as a PDF, Amazon or some other format. So three questions 1) How good are the jpg files? 2) How many pages 3) What format do you want the final output to be? PDF? EPUB? Kindle/Mobi? something else? |
You can turn them into a PDF file but you'll need to get a program to do it, I think Foxit do a reasonably priced one.
The other option is to use an OCR program to turn the pictures into text and create an e-book from that. edit: Milo beat me to it. |
if the object is simply to create a PDF file, then PDFCreator is a simple open-source program
PDFCreator | Free Business & Enterprise software downloads at SourceForge.net However all that will do iwith the jpg files is convert the image format from jpg to pdf. Nothing will directly convert the jpg file to an editable / searchable text-based pdf. To do that you will have to OCR it and then convert PDFCreator will create a searchable PDF once the OCR has been done if you want to use Amazon's Kindle, then start here at Amazons online publishing site https://kdp.amazon.com/self-publishing/signin or find their downloadable program at If you want another format then look at Calibre calibre - E-book management But your first problem is extracting the text from those image files If they are only small it could be easier to retype them |
There are 300 + images and they are two pages from the book per image. Can you recommend some OCR software?
|
ABBYY Finereader is easily the most accurate I've ever used - but thats a very limited sample! However it does have a good reputation
Old or cut down versions are often supplied free with scanners OCR software for text recognition OCR PDF features - ABBYY FineReader If you already have a scanner, you'll probably find you already have some bundled OCR software edit PS - something just remembered The Open Source "Tesseract" program had a good reputation also, though I've never used it https://code.google.com/p/tesseract-ocr/ Another Google project |
I have an HP psc , will I haveto print the jpegs and then manually scan them or can the task be performed within the software?
|
Primo PDF will do it.
Install it as a printer driver then print all of the JPG files into that 'printer' and job done. |
As far as I know PrimopDF does not have an OCR element, so that all it would do is convert the jpg image to a PDF image - NOT a PDF with embedded text.
So while you have a PDF output file, you would not be able to index or search it. If you just wanted to do that then you may as well use PDFCreator or one of the other free PDF programs Yes it would create a PDF file (or series of files), but for use as an Ebook those files would be functionally useless |
Tubby
You can set the software to OCR the existing image file. You don't have to print and rescan There will be OCR software with that HP scanner, though just what I don't know. It varied with age. It may even be possible to batch scan several files, though trying to do 300 pages at once will overload the RAM by a long way |
One free option is the Tesseract OCR program, which is now maintained by Google. I just tried it on a scanned letter, and it works very well. It is a command line program, so it's not the easiest to use, but can be faster on a lot of pages.
|
bnt - see post #6 above ;)
SD |
There is a graphical front end for Tessearct - see Freeocr Scanning OCR Software - OCR PDF Document Scanner Software
There are other plugins for it listed at https://code.google.com/p/tesseract-ocr/wiki/AddOns#GUI One that looks interesting for this project is at https://code.google.com/p/ocrivist/ - but its for Linux only |
All times are GMT. The time now is 11:58. |
Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.