[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

PDF to text covertor?

dare ruby

8/11/2008 9:42:00 AM

Dear all,

Could anyone explain how to do convert PDF to text format.

Thanks in advance

Regards,
Jose Martin
--
Posted via http://www.ruby-....

4 Answers

Axel Etzold

8/11/2008 10:04:00 AM

0


-------- Original-Nachricht --------
> Datum: Mon, 11 Aug 2008 18:41:51 +0900
> Von: dare ruby <martin@angleritech.com>
> An: ruby-talk@ruby-lang.org
> Betreff: PDF to text covertor?

> Dear all,
>
> Could anyone explain how to do convert PDF to text format.
>
> Thanks in advance
>
> Regards,
> Jose Martin
> --
> Posted via http://www.ruby-....

Dear Jose,

it depends on whether your PDF actually contains text or just images that a human can recognize as
text.
In the first case, you can try using tools like pdftotext (http://en.wikipedia.org/wiki...), on Linux and
Mac, at least. On Windows, there are also some pdf viewers where you can say , "Save as text" .

In the second case, you'll have to use an OCR (optical character recognition) software. There are some
good commercial ones available. I've liked ABBYY's Finereader (on Windows).

Best regards,

Axel

--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/...

Kouhei Sutou

8/11/2008 11:08:00 AM

0

Hi,

In <59a3f50dc89e69c5250b753986657c78@ruby-forum.com>
"PDF to text covertor?" on Mon, 11 Aug 2008 18:41:51 +0900,
dare ruby <martin@angleritech.com> wrote:

> Could anyone explain how to do convert PDF to text format.

It seems that Ruby/Poppler(*1), the Ruby bindings of
Poppler(*2), is what you're looking for.
http://ruby-gnome2.svn.sourceforge.net/viewvc/ruby-gnome2/ruby-gnome2/trunk/poppler/sample/pdf2text.rb?v...

(*1) http://ruby-gnome2.sourceforge.jp/hiki.cgi?Ruby...
(*2) http://poppler.freede...

pdftotext is a bundled application in Poppler.


Thanks,
--
kou

dare ruby

8/19/2008 6:11:00 AM

0

I have some of the study materials as PDF documents. I need to parse the
PDF to any text format like microsoft word or text pad in windows OS. I
need to do parsing using a ruby program. Could any one suggesst on this?

Thanks in advance

Regards,
Jose Martin

--
Posted via http://www.ruby-....

Martin DeMello

8/19/2008 5:39:00 PM

0

On Mon, Aug 18, 2008 at 11:10 PM, dare ruby <martin@angleritech.com> wrote:
> I have some of the study materials as PDF documents. I need to parse the
> PDF to any text format like microsoft word or text pad in windows OS. I
> need to do parsing using a ruby program. Could any one suggesst on this?

Your best bet is a ruby script that calls out to xpdf to do the actual
pdf->text conversion, then parses the text. There's a windows port of
the xpdf command line utilities.

http://gnuwin32.sourceforge.net/package...
http://www.perlmonks.org/?node...
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_sear...
http://forjournalists.com/cookbook/index.php?...

martin