[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: Splitting a sentence with delimiter preserved

Gavin Kistner

10/17/2006 8:37:00 PM

From: Ajithkumar Warrier [mailto:a.varier@gmail.com]
> I am a newbie and the answer to this might be too simple.
> How do I improve the example below and reduce the number of passes
> over the string?
>
> string = "This is an example string. The purpose is to save the
> delimiter during split. Does this work. Great!!!."
>
> re = /(\.\s+)(\D)/
> string.gsub!(re,'\1'+'#'+'\2')
> b = string.split('#')
> puts b

p string.scan( /\w[^.!?]+\S+/ )

#=> ["This is an example string.", "The purpose is to save the delimiter
during split.", "Does this work.", "Great!!!."]

7 Answers

Ajithkumar Warrier

10/17/2006 9:29:00 PM

0

On 10/17/06, Gavin Kistner <gavin.kistner@anark.com> wrote:
That was very quick.

Thank you.
> From: Ajithkumar Warrier [mailto:a.varier@gmail.com]
> > I am a newbie and the answer to this might be too simple.
> > How do I improve the example below and reduce the number of passes
> > over the string?
> >
> > string = "This is an example string. The purpose is to save the
> > delimiter during split. Does this work. Great!!!."
> >
> > re = /(\.\s+)(\D)/
> > string.gsub!(re,'\1'+'#'+'\2')
> > b = string.split('#')
> > puts b
>
> p string.scan( /\w[^.!?]+\S+/ )
>
> #=> ["This is an example string.", "The purpose is to save the delimiter
> during split.", "Does this work.", "Great!!!."]
>
>

dblack

10/18/2006 12:12:00 AM

0

matt

10/18/2006 2:49:00 AM

0

<dblack@wobblini.net> wrote:

> Also, in 1.9, with oniguruma

Is the current (1.8.5) regex engine some (other) well-known engine? For
example it is very like PCRE, but I take it that it is not PCRE. Just
curious. m.

--
matt neuburg, phd = matt@tidbits.com, http://www.tidbits...
Tiger - http://www.takecontrolbooks.com/tiger-custom...
AppleScript - http://www.amazon.com/gp/product/...
Read TidBITS! It's free and smart. http://www.t...

Gavin Kistner

10/18/2006 3:02:00 AM

0

dblack@wobblini.net wrote:
> Also, in 1.9, with oniguruma, you can do:
>
> string.split(/(?<=\.)\s+/)
>
> (negative lookbehind).

Er, positive lookbehind, I believe you mean.

For completeness, if you wanted to use this form and also wanted to
allow exclamation points and question marks as sentence delimiters in
addition to periods, you could use:

string.split( /(?<=[.!?])\s+/ )

Robert Klemme

10/18/2006 8:34:00 AM

0

On 17.10.2006 22:37, Gavin Kistner wrote:
> From: Ajithkumar Warrier [mailto:a.varier@gmail.com]
>> I am a newbie and the answer to this might be too simple.
>> How do I improve the example below and reduce the number of passes
>> over the string?
>>
>> string = "This is an example string. The purpose is to save the
>> delimiter during split. Does this work. Great!!!."
>>
>> re = /(\.\s+)(\D)/
>> string.gsub!(re,'\1'+'#'+'\2')
>> b = string.split('#')
>> puts b
>
> p string.scan( /\w[^.!?]+\S+/ )
>
> #=> ["This is an example string.", "The purpose is to save the delimiter
> during split.", "Does this work.", "Great!!!."]

>> string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."
>> string = string + " It costs 0.1 dollars."
=> "This is an example string. The purpose is to save the\ndelimiter
during split. Does this work. Great!!!. It costs 0.1 dollars."
>> string.scan( /\w[^.!?]+\S+/ )
=> ["This is an example string.", "The purpose is to save the\ndelimiter
during split.", "Does this work.", "Great!!!.", "It costs 0.1", "do
llars."]

Hm...

robert

James Gray

10/18/2006 12:54:00 PM

0

On Oct 17, 2006, at 9:55 PM, matt neuburg wrote:

> <dblack@wobblini.net> wrote:
>
>> Also, in 1.9, with oniguruma
>
> Is the current (1.8.5) regex engine some (other) well-known engine?
> For
> example it is very like PCRE, but I take it that it is not PCRE.

Ruby's current regex engine is pretty limited compared to PCRE or
Oniguruma. I'm not aware of the name for the current engine.

James Edward Gray II


Gavin Kistner

10/18/2006 1:36:00 PM

0

Robert Klemme wrote:
> On 17.10.2006 22:37, Gavin Kistner wrote:
> > p string.scan( /\w[^.!?]+\S+/ )
> >
> > #=> ["This is an example string.", "The purpose is to save the delimiter
> > during split.", "Does this work.", "Great!!!."]
>
> >> string = "This is an example string. The purpose is to save the
> delimiter during split. Does this work. Great!!!."
> >> string = string + " It costs 0.1 dollars."
> => "This is an example string. The purpose is to save the\ndelimiter
> during split. Does this work. Great!!!. It costs 0.1 dollars."
> >> string.scan( /\w[^.!?]+\S+/ )
> => ["This is an example string.", "The purpose is to save the\ndelimiter
> during split.", "Does this work.", "Great!!!.", "It costs 0.1", "do
> llars."]

Down this path leads the madness that is trying to use simple regexp to
parse something as complex as English grammar. That said, here's
another regexp that still works and fixes that particular case:

string = "This is an example string. The purpose is to save the
delimiter during split. Does this work. Great!!!."
string = string + " It costs 0.1 dollars."
p string.scan( /\w.+?[.!?]+(?=\s|\Z)/ )
#=> ["This is an example string.", "The purpose is to save the
delimiter during split.", "Does this work.", "Great!!!.", "It costs 0.1
dollars."]

It'll still fail on sentences with embedded quotes that have
sub-sentences within them.