Wikiposts
Search
Computer/Internet Issues & Troubleshooting Anyone with questions about the terribly complex world of computers or the internet should try here. NOT FOR REPORTING ISSUES WITH PPRuNe FORUMS! Please use the subforum "PPRuNe Problems or Queries."

Extracting info for html page

Thread Tools
 
Search this Thread
 
Old 8th Dec 2009, 08:35
  #1 (permalink)  
Per Ardua ad Astraeus
Thread Starter
 
Join Date: Mar 2000
Location: UK
Posts: 18,579
Likes: 0
Received 0 Likes on 0 Posts
Extracting info for html page

I am supplied with a wide variety of inputs for news on a village website.

First problem is extracting text from a MS Pub file. Html formatted text then goes into tables on a prepared page using my style sheets. I cannot get 'Save as html' to allow insertion of the code into tables, and each page generates over 2000 NEW lines of css style in true MS fashion. At the moment I am transcribing longhand into html - any tricks?

Second one is extracting an image from a PDF file. I have tried various progs (I have Acrobat) but the resulting colour is not true. Resorting to jpg screensahots at this time.
BOAC is offline  
Old 8th Dec 2009, 10:15
  #2 (permalink)  
 
Join Date: Jul 2003
Location: Brum
Posts: 852
Likes: 0
Received 1 Like on 1 Post
If you have access to a Mac, FileJuicer will extract all text/jpegs etc. from almost any file...

FileJuicer

Nige
Nige321 is offline  
Old 8th Dec 2009, 11:14
  #3 (permalink)  
Per Ardua ad Astraeus
Thread Starter
 
Join Date: Mar 2000
Location: UK
Posts: 18,579
Likes: 0
Received 0 Likes on 0 Posts
Ta, but 'Macless in Gaza'!
BOAC is offline  
Old 8th Dec 2009, 14:30
  #4 (permalink)  
Per Ardua ad Astraeus
Thread Starter
 
Join Date: Mar 2000
Location: UK
Posts: 18,579
Likes: 0
Received 0 Likes on 0 Posts
TSC- thanks for reply - para by para

1- See post #1 - colour wrong
2- Yes for that society, that's what they are used to
3- will try 'filtered'
4- see post #1
BOAC is offline  
Old 8th Dec 2009, 17:35
  #5 (permalink)  
Spoon PPRuNerist & Mad Inistrator
 
Join Date: Sep 2003
Location: Twickenham, home of rugby
Posts: 7,417
Received 281 Likes on 179 Posts
I use Notepad a lot to strip out almost everything except the plain ASCII text - works great for copying text from web pages for re-processing, as you lose all the crap.

Dunno if that helps.

SD
Saab Dastard is offline  
Old 8th Dec 2009, 17:51
  #6 (permalink)  
Per Ardua ad Astraeus
Thread Starter
 
Join Date: Mar 2000
Location: UK
Posts: 18,579
Likes: 0
Received 0 Likes on 0 Posts
Yes - geting the basic text out was fine - it was all the lovely text formatting/colours etc that took all the effort.
BOAC is offline  

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.