PPRuNe Forums - Extracting info for html page

PPRuNe Forums (https://www.pprune.org/)

- Computer/Internet Issues & Troubleshooting (https://www.pprune.org/computer-internet-issues-troubleshooting-46/)

- - Extracting info for html page (https://www.pprune.org/computer-internet-issues-troubleshooting/398212-extracting-info-html-page.html)

BOAC	8th Dec 2009 08:35

Extracting info for html page

I am supplied with a wide variety of inputs for news on a village website.

First problem is extracting text from a MS Pub file. Html formatted text then goes into tables on a prepared page using my style sheets. I cannot get 'Save as html' to allow insertion of the code into tables, and each page generates over 2000 NEW lines of css style in true MS fashion. At the moment I am transcribing longhand into html - any tricks?

Second one is extracting an image from a PDF file. I have tried various progs (I have Acrobat) but the resulting colour is not true. Resorting to jpg screensahots at this time.

Nige321

8th Dec 2009 10:15

If you have access to a Mac, FileJuicer will extract all text/jpegs etc. from almost any file...

FileJuicer

Nige

BOAC	8th Dec 2009 11:14

Ta, but 'Macless in Gaza'!

BOAC	8th Dec 2009 14:30

TSC- thanks for reply - para by para

1- See post #1 - colour wrong
2- Yes for that society, that's what they are used to
3- will try 'filtered'
4- see post #1

Saab Dastard

8th Dec 2009 17:35

I use Notepad a lot to strip out almost everything except the plain ASCII text - works great for copying text from web pages for re-processing, as you lose all the crap.

Dunno if that helps.

SD

BOAC	8th Dec 2009 17:51

Yes - geting the basic text out was fine - it was all the lovely text formatting/colours etc that took all the effort.

All times are GMT. The time now is 11:55.