[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Natural language detection library

Thomas Nitsche

5/7/2007 11:18:00 AM

Hi,

does anyone know a Ruby library/module to detect which natural language
is a given input text?

Cheers

Thomas

--
Posted via http://www.ruby-....

4 Answers

ThoML

5/7/2007 11:55:00 AM

0

You can use the zlib library for this as does deplate:

from: http://deplate.sourceforge.net/Modules.html#h...
> The algorithm of this plugin is based on D Benedetto & E Caglioti
> & V Loreto ?Language Trees and Zipping?[1]. It?s a direct port of
> Dirk Holtwick?s ?Guess language of text using ZIP?[2].

[1] http://xxx.uni-augsburg.de/format/cond-m...
[2] http://aspn.activestate.com/ASPN/Cookbook/Python/Rec...

Ruby code:
http://deplate.cvs.sourceforge.net/deplate/deplate/lib/deplate/guesslanguage.rb?v...

Works quite well for me.

Thomas.


Thomas Nitsche

5/8/2007 2:47:00 PM

0

On May 7, 1:54 pm, micathom <micat...@gmail.com> wrote:
> You can use the zlib library for this as does deplate:
>
> from:http://deplate.sourceforge.net/Modules.html#h...
>
> > The algorithm of this plugin is based on D Benedetto & E Caglioti
> > & V Loreto ?Language Trees and Zipping?[1]. It?s a direct port of
> > Dirk Holtwick?s ?Guess language of text using ZIP?[2].
>
> [1]http://xxx.uni-augsburg.de/format/cond-m...
> [2]http://aspn.activestate.com/ASPN/Cookbook/Python/Rec...
>
> Ruby code:http://deplate.cvs.sourceforge.net/deplate/deplate/lib/depl......
>
> Works quite well for me.
>
> Thomas.

thx, I don't have a clue how it works, but it's great ;-)

ThoML

5/9/2007 8:09:00 AM

0

> thx, I don't have a clue how it works, but it's great ;-)

You need some base corpus/sample (I use the GPL License) for each
language and calculate an index number for this on initiatlization.
Then you compare a sample text's index with the base index.

An example for how to use this can be found at:
http://deplate.cvs.sourceforge.net/deplate/deplate/lib/deplate/mod/guesslanguage.rb?v...

First call #register(language_name, text) for each corpus, then call
#guess_with_diff(text) to guess a text's language.

Thomas Nitsche

5/9/2007 9:33:00 AM

0


Sorry, I meant I have no idea how the algorithm is working. The code
itself works like a charm.
Actually I'm using the "Declaration of Human Rights" ;-)

Cheers,

Thomas

On May 9, 10:08 am, micathom <micat...@gmail.com> wrote:
> > thx, I don't have a clue how it works, but it's great ;-)
>
> You need some base corpus/sample (I use the GPL License) for each
> language and calculate an index number for this on initiatlization.
> Then you compare a sample text's index with the base index.
>
> An example for how to use this can be found at:http://deplate.cvs.sourceforge.net/deplate/deplate/lib/depl......
>
> First call #register(language_name, text) for each corpus, then call
> #guess_with_diff(text) to guess a text's language.