Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Re: Text extraction from PDF files (non-European languages)...?
Kouhei Sutou
11/22/2006 12:53:00 AM
Hi,
2006/11/22, Nuralanur@aol.com <Nuralanur@aol.com>:
> is there a way of extracting text from a PDF, if the latter
> is in some non-European language, such as Arabic or
> Chinese?
> Under Linux, I have been able to use Ruby in conjunction
> with pdftotext for English and other Latin1 encoded texts -
> with some problems sometimes for special characters,
> but it doesn't seem to work for Unicode ...
Which version of pdftotext did you use? Xpdf or poppler?
You need to install character map files for other Latin1 encoded
texts.
> Is there a Ruby way to do this ?
You can use Ruby/Poppler if poppler doesn't have any problem:
http://ruby-gnome2.cvs.sourceforge.net/ruby-gnome2/ruby-gnome2/poppler/sample/pdf2text.rb?revision=HEAD&v...
Thanks,
--
kou
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Re: Text extraction from PDF files (non-European languages)...?
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password