Wikiposts
Search
Computer/Internet Issues & Troubleshooting Anyone with questions about the terribly complex world of computers or the internet should try here. NOT FOR REPORTING ISSUES WITH PPRuNe FORUMS! Please use the subforum "PPRuNe Problems or Queries."

Making a book.

Thread Tools
 
Search this Thread
 
Old 5th May 2012, 17:31
  #1 (permalink)  
Thread Starter
 
Join Date: Mar 2007
Location: Here and there
Posts: 2,781
Likes: 0
Received 1 Like on 1 Post
Making a book.

I have a book the pages of which exist on my pc as jpegs of the original printed book pages. Would anybody be able to suggest something to convert the jpegs into an e-book?
tubby linton is offline  
Old 5th May 2012, 18:32
  #2 (permalink)  
 
Join Date: Jan 2012
Location: .
Posts: 2,173
Likes: 0
Received 0 Likes on 0 Posts
messy
and success depends on the quality of those images.


In a nutshell you are going to have to OCR each page image and convert in into text. Each page will need correcting as OCR technology is not 100% - by a long way
You then need to paste the corrected text files into one large text file, and then output it - as a PDF, Amazon or some other format.

So three questions
1) How good are the jpg files?
2) How many pages
3) What format do you want the final output to be? PDF? EPUB? Kindle/Mobi? something else?
Milo Minderbinder is offline  
Old 5th May 2012, 18:36
  #3 (permalink)  
More bang for your buck
 
Join Date: Nov 2005
Location: land of the clanger
Age: 82
Posts: 3,512
Likes: 0
Received 0 Likes on 0 Posts
You can turn them into a PDF file but you'll need to get a program to do it, I think Foxit do a reasonably priced one.
The other option is to use an OCR program to turn the pictures into text and create an e-book from that.

edit: Milo beat me to it.
green granite is offline  
Old 5th May 2012, 18:48
  #4 (permalink)  
 
Join Date: Jan 2012
Location: .
Posts: 2,173
Likes: 0
Received 0 Likes on 0 Posts
if the object is simply to create a PDF file, then PDFCreator is a simple open-source program
PDFCreator | Free Business & Enterprise software downloads at SourceForge.net

However all that will do iwith the jpg files is convert the image format from jpg to pdf. Nothing will directly convert the jpg file to an editable / searchable text-based pdf. To do that you will have to OCR it and then convert

PDFCreator will create a searchable PDF once the OCR has been done

if you want to use Amazon's Kindle, then start here at Amazons online publishing site
https://kdp.amazon.com/self-publishing/signin
or find their downloadable program at
Amazon Amazon

If you want another format then look at Calibre
calibre - E-book management

But your first problem is extracting the text from those image files
If they are only small it could be easier to retype them

Last edited by Milo Minderbinder; 5th May 2012 at 19:05.
Milo Minderbinder is offline  
Old 5th May 2012, 19:06
  #5 (permalink)  
Thread Starter
 
Join Date: Mar 2007
Location: Here and there
Posts: 2,781
Likes: 0
Received 1 Like on 1 Post
There are 300 + images and they are two pages from the book per image. Can you recommend some OCR software?
tubby linton is offline  
Old 5th May 2012, 19:16
  #6 (permalink)  
 
Join Date: Jan 2012
Location: .
Posts: 2,173
Likes: 0
Received 0 Likes on 0 Posts
ABBYY Finereader is easily the most accurate I've ever used - but thats a very limited sample! However it does have a good reputation
Old or cut down versions are often supplied free with scanners
OCR software for text recognition OCR PDF features - ABBYY FineReader

If you already have a scanner, you'll probably find you already have some bundled OCR software




edit
PS - something just remembered
The Open Source "Tesseract" program had a good reputation also, though I've never used it
https://code.google.com/p/tesseract-ocr/
Another Google project

Last edited by Milo Minderbinder; 5th May 2012 at 19:29.
Milo Minderbinder is offline  
Old 5th May 2012, 20:23
  #7 (permalink)  
Thread Starter
 
Join Date: Mar 2007
Location: Here and there
Posts: 2,781
Likes: 0
Received 1 Like on 1 Post
I have an HP psc , will I haveto print the jpegs and then manually scan them or can the task be performed within the software?
tubby linton is offline  
Old 5th May 2012, 20:39
  #8 (permalink)  
 
Join Date: Apr 2002
Posts: 1
Likes: 0
Received 0 Likes on 0 Posts
Primo PDF will do it.

Install it as a printer driver then print all of the JPG files into that 'printer' and job done.
PPRuNeUser0171 is offline  
Old 5th May 2012, 20:58
  #9 (permalink)  
 
Join Date: Jan 2012
Location: .
Posts: 2,173
Likes: 0
Received 0 Likes on 0 Posts
As far as I know PrimopDF does not have an OCR element, so that all it would do is convert the jpg image to a PDF image - NOT a PDF with embedded text.
So while you have a PDF output file, you would not be able to index or search it.
If you just wanted to do that then you may as well use PDFCreator or one of the other free PDF programs
Yes it would create a PDF file (or series of files), but for use as an Ebook those files would be functionally useless
Milo Minderbinder is offline  
Old 5th May 2012, 21:02
  #10 (permalink)  
 
Join Date: Jan 2012
Location: .
Posts: 2,173
Likes: 0
Received 0 Likes on 0 Posts
Tubby

You can set the software to OCR the existing image file. You don't have to print and rescan
There will be OCR software with that HP scanner, though just what I don't know. It varied with age. It may even be possible to batch scan several files, though trying to do 300 pages at once will overload the RAM by a long way
Milo Minderbinder is offline  
Old 5th May 2012, 22:59
  #11 (permalink)  
bnt
 
Join Date: Feb 2007
Location: Dublin, Ireland. (No, I just live here.)
Posts: 733
Received 6 Likes on 5 Posts
One free option is the Tesseract OCR program, which is now maintained by Google. I just tried it on a scanned letter, and it works very well. It is a command line program, so it's not the easiest to use, but can be faster on a lot of pages.
bnt is offline  
Old 5th May 2012, 23:04
  #12 (permalink)  
Spoon PPRuNerist & Mad Inistrator
 
Join Date: Sep 2003
Location: Twickenham, home of rugby
Posts: 7,401
Received 274 Likes on 174 Posts
bnt - see post #6 above

SD
Saab Dastard is offline  
Old 6th May 2012, 10:07
  #13 (permalink)  
 
Join Date: Jan 2012
Location: .
Posts: 2,173
Likes: 0
Received 0 Likes on 0 Posts
There is a graphical front end for Tessearct - see Freeocr Scanning OCR Software - OCR PDF Document Scanner Software

There are other plugins for it listed at https://code.google.com/p/tesseract-ocr/wiki/AddOns#GUI
One that looks interesting for this project is at https://code.google.com/p/ocrivist/ - but its for Linux only
Milo Minderbinder is offline  

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.