[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regex - Exclude Multiple Characters and Global Scanning

Ben Woodcroft

6/21/2008 4:49:00 AM

Hihi,

I have 2 problems.

--------------Question 1-----------------------
Firstly, a Ruby question. I'm confused about how to match a single
regular expression multiple times in a single string. For instance,

'llgllallo'.match(/(ll.)/)[0] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> nil

How do I access all 3 matches? String#scan will work, but that gives me

'llgllallo'.scan(/(ll.)/) #=> [["llg"], ["lla"], ["llo"]]

But I need the offsets, and this info isn't given to me.



--------------Question 2-----------------------
Now an old gap in my regex understanding. How do I exclude on
consecutive characters? I want something like [^abc], except aba or bbc
is ok, just not 'abc'. Summing this up:

reg = /something/
'abc'.match(reg) #-> no match
'cba'.match(reg) #-> match

And then I want to be able to do OR operations too, like not 'abc' and
not 'bbc', but that is probably another step of complexity.

I don't suppose there is any way to pass a block to the regex to use in
a specific place? That would be cool, though maybe not possible given
optimisations in regex?



Thanks in advance,
ben
--
Posted via http://www.ruby-....

2 Answers

David A. Black

6/21/2008 8:00:00 AM

0

Hi --

On Sat, 21 Jun 2008, Ben Woodcroft wrote:

> Hihi,
>
> I have 2 problems.
>
> --------------Question 1-----------------------
> Firstly, a Ruby question. I'm confused about how to match a single
> regular expression multiple times in a single string. For instance,
>
> 'llgllallo'.match(/(ll.)/)[0] #-> 'llg'
> 'llgllallo'.match(/(ll.)/)[1] #-> 'llg'
> 'llgllallo'.match(/(ll.)/)[1] #-> nil
>
> How do I access all 3 matches? String#scan will work, but that gives me
>
> 'llgllallo'.scan(/(ll.)/) #=> [["llg"], ["lla"], ["llo"]]
>
> But I need the offsets, and this info isn't given to me.

You could do:

irb(main):029:0> offsets = []
=> []
irb(main):030:0> str.scan(/ll./) { offsets << $~.offset(0)[1] }
=> "llgllallo"
irb(main):031:0> offsets
=> [3, 6, 9]

(Pending someone coming up with something slicker. I don't like the
temp variable particularly, but anyway.)

> --------------Question 2-----------------------
> Now an old gap in my regex understanding. How do I exclude on
> consecutive characters? I want something like [^abc], except aba or bbc
> is ok, just not 'abc'. Summing this up:

[^abc] means: match one character that is not 'a', not 'b', and not
'c'. I don't think that's what you mean.

> reg = /something/
> 'abc'.match(reg) #-> no match
> 'cba'.match(reg) #-> match
>
> And then I want to be able to do OR operations too, like not 'abc' and
> not 'bbc', but that is probably another step of complexity.

You can use (?!), which is negative lookahead.

irb(main):033:0> reg = /(?!abc)[abc]{3}/
=> /(?!abc)[abc]{3}/

So that means: three of a, b, c, as long as we're not looking at
"abc" when we start looking for those three characters.

irb(main):034:0> reg.match("abc")
=> nil
irb(main):035:0> reg.match("abb")
=> #<MatchData:0x69de8>
irb(main):036:0> reg.match("cba")
=> #<MatchData:0x63de4>

> I don't suppose there is any way to pass a block to the regex to use in
> a specific place? That would be cool, though maybe not possible given
> optimisations in regex?

Blocks get passed to methods, not objects, and regexes are objects.
Some of the methods that use regexes also take blocks, like scan, sub,
and gsub. I'm not sure what you mean about the specific place, though.


David

--
Rails training from David A. Black and Ruby Power and Light:
ADVANCING WITH RAILS June 16-19 Berlin
ADVANCING WITH RAILS July 21-24 Edison, NJ
See http://www.r... for details and updates!

Ben Woodcroft

6/22/2008 1:52:00 AM

0

David A. Black wrote:
> You could do:
>
> irb(main):029:0> offsets = []
> => []
> irb(main):030:0> str.scan(/ll./) { offsets << $~.offset(0)[1] }
> => "llgllallo"
> irb(main):031:0> offsets
> => [3, 6, 9]
>
> (Pending someone coming up with something slicker. I don't like the
> temp variable particularly, but anyway.)
>

That will work, thanks. It would seem intuitive to me that scan (or a
method like it) would iterate of MatchData objects, but anyway. Thanks.

>> --------------Question 2-----------------------
>> Now an old gap in my regex understanding. How do I exclude on
>> consecutive characters? I want something like [^abc], except aba or bbc
>> is ok, just not 'abc'. Summing this up:
>
> [^abc] means: match one character that is not 'a', not 'b', and not
> 'c'. I don't think that's what you mean.
>
>> reg = /something/
>> 'abc'.match(reg) #-> no match
>> 'cba'.match(reg) #-> match
>>
>> And then I want to be able to do OR operations too, like not 'abc' and
>> not 'bbc', but that is probably another step of complexity.
>
> You can use (?!), which is negative lookahead.
>
> irb(main):033:0> reg = /(?!abc)[abc]{3}/
> => /(?!abc)[abc]{3}/
>
> So that means: three of a, b, c, as long as we're not looking at
> "abc" when we start looking for those three characters.
>
> irb(main):034:0> reg.match("abc")
> => nil
> irb(main):035:0> reg.match("abb")
> => #<MatchData:0x69de8>
> irb(main):036:0> reg.match("cba")
> => #<MatchData:0x63de4>

That is exactly what I meant. I was unaware of the negative lookahead
operator. Thanks!

>
>> I don't suppose there is any way to pass a block to the regex to use in
>> a specific place? That would be cool, though maybe not possible given
>> optimisations in regex?
>
> Blocks get passed to methods, not objects, and regexes are objects.
> Some of the methods that use regexes also take blocks, like scan, sub,
> and gsub. I'm not sure what you mean about the specific place, though.
>

My question was not explained very well, sorry. I meant it would be cool
if you could pass a block that became part of the regex itself. For
instance instead of /(?!abc)/ you could somehow tell it
{|s| s != 'abc'}

Just an idea, doesn't really matter now you've fixed my problem.

Thanks,
ben

>
> David

--
Posted via http://www.ruby-....