[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Extract PDF content?

EdUarDo

4/7/2006 8:03:00 AM

Hi all,

Is there any gem or library which allows to extract text from a .PDF file?, any for Word or OpenOffice files?
4 Answers

akbarhome

4/7/2006 10:41:00 AM

0

reading pdf with pure ruby? no. Only creating pdf now.....

Reading word? I don't know.....

Jon Wood

4/7/2006 4:10:00 PM

0


EdUarDo wrote:
> Hi all,
>
> Is there any gem or library which allows to extract text from a .PDF file?, any for Word or OpenOffice files?

I don't know about PDFs, but there are several programs available that
can convert a Word file into HTML - you'll probably lose formatting,
but you should then be able to process the file like any other XML to
extract the text content from it.

Jon

Dave Burt

4/8/2006 12:47:00 PM

0

EdUarDo wrote:
> Is there any gem or library which allows to extract text from a .PDF
> file?, any for Word or OpenOffice files?

You can use Windows Automation, the WIN32OLE library, and Microsoft Word
to open a Word document and use "Save As" to produce a plain text file
or expose the contents programmatically.

Cheers,
Dave

Martin DeMello

4/9/2006 5:42:00 PM

0

Jon Wood <jellybob@gmail.com> wrote:
>
> EdUarDo wrote:
> > Hi all,
> >
> > Is there any gem or library which allows to extract text from a .PDF file?, any for Word or OpenOffice files?
>
> I don't know about PDFs, but there are several programs available that
> can convert a Word file into HTML - you'll probably lose formatting,
> but you should then be able to process the file like any other XML to
> extract the text content from it.

There are some command line switches available for openoffice too -
http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openo...

You should be able to script it to open the file and save as text.

martin