Extracting info for html page [Archive]

View Full Version : Extracting info for html page

BOAC

8th Dec 2009, 08:35

I am supplied with a wide variety of inputs for news on a village website.

First problem is extracting text from a MS Pub file. Html formatted text then goes into tables on a prepared page using my style sheets. I cannot get 'Save as html' to allow insertion of the code into tables, and each page generates over 2000 NEW lines of css style in true MS fashion. At the moment I am transcribing longhand into html - any tricks?

Second one is extracting an image from a PDF file. I have tried various progs (I have Acrobat) but the resulting colour is not true. Resorting to jpg screensahots at this time.

Nige321

8th Dec 2009, 10:15

If you have access to a Mac, FileJuicer will extract all text/jpegs etc. from almost any file...

FileJuicer (http://echoone.com/filejuicer/)

Nige

BOAC

8th Dec 2009, 11:14

Ta, but 'Macless in Gaza'!

BOAC

8th Dec 2009, 14:30

TSC- thanks for reply - para by para

1- See post #1 - colour wrong
2- Yes for that society, that's what they are used to
3- will try 'filtered'
4- see post #1

Saab Dastard

8th Dec 2009, 17:35

I use Notepad a lot to strip out almost everything except the plain ASCII text - works great for copying text from web pages for re-processing, as you lose all the crap.

Dunno if that helps.

SD

BOAC

8th Dec 2009, 17:51

Yes - geting the basic text out was fine - it was all the lovely text formatting/colours etc that took all the effort.