[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Tokenizing text

Juan Alvarez

2/24/2009 11:07:00 PM

Hello,

I need to tokenize English text into sentences. I realize this is a very
complex task to get right all of the time (if possible at all) but for
the time being I'm only trying to implement a better solution than
strintg.split('.').

Bowsing around I found this snippet:

string.scan( /\w.+?[.!?]+(?=\s|\Z)/ )

which almost works for what I need except for two cases: ellipses and at
least most common abbreviations. Abbreviations are the hardest part and
I've been tinkering with a couple possible solutions. How would you
approach this?

Thanks in advance
Juan
--
Posted via http://www.ruby-....