Gavin Kistner
10/18/2006 1:36:00 PM
Robert Klemme wrote:
> On 17.10.2006 22:37, Gavin Kistner wrote:
> > p string.scan( /\w[^.!?]+\S+/ )
> >
> > #=> ["This is an example string.", "The purpose is to save the delimiter
> > during split.", "Does this work.", "Great!!!."]
>
> >> string = "This is an example string. The purpose is to save the
> delimiter during split. Does this work. Great!!!."
> >> string = string + " It costs 0.1 dollars."
> => "This is an example string. The purpose is to save the\ndelimiter
> during split. Does this work. Great!!!. It costs 0.1 dollars."
> >> string.scan( /\w[^.!?]+\S+/ )
> => ["This is an example string.", "The purpose is to save the\ndelimiter
> during split.", "Does this work.", "Great!!!.", "It costs 0.1", "do
> llars."]
Down this path leads the madness that is trying to use simple regexp to
parse something as complex as English grammar. That said, here's
another regexp that still works and fixes that particular case:
string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."
string = string + " It costs 0.1 dollars."
p string.scan( /\w.+?[.!?]+(?=\s|\Z)/ )
#=> ["This is an example string.", "The purpose is to save the
delimiter during split.", "Does this work.", "Great!!!.", "It costs 0.1
dollars."]
It'll still fail on sentences with embedded quotes that have
sub-sentences within them.