Mark Thomas
4/7/2009 7:43:00 PM
On Apr 7, 2:23 pm, Raimon Fs <co...@montx.com> wrote:
> Mark Thomas wrote:
> >> With 1.9's Oniguruma (is it available for 1.8?) it's quite easy
>
> > This shorter one works in 1.8
>
> > scan(/EMISOR:\s*([\w\s]+?)(?=\s*[A-Z][a-z])/).flatten
>
> > I'm curious as to what Oniguruma-specific feature you used in yours.
>
> > -- Mark.
>
> thanks to all, at this moment I have enough with Ruby 1.8.7, so I'm with
> this one, that works perfectly.
>
> Can you explain why this works ?
>
> :-)
>
> /EMISOR:\s*([\w\s]+?)(?=\s*[A-Z][a-z])/
>
> EMISOR:\s is clear to me, but why it doesn't appear later in the array,
> because it hasn't () ?
>
> The * is also clear
>
> ([\w\s]+?) means select all uppercase words/letters ?
[\w\s] is a character class that matches "word characters" or spaces.
The + makes it one or more. The ? means make it non-greedy (only match
the minimum to make it true).
> (?=\s*[A-Z][a-z]) until you reach a space between uppercase and
> uppercase with lowercase later?
the (?= ) is a lookahead assertion. It looks for a match ahead,
without capturing it. So if you have any spaces, followed by an
uppercase then lowercase letter, the previous match will stop
matching.
-- Mark.