Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Regex - Exclude Multiple Characters and Global Scanning
Ben Woodcroft
6/21/2008 4:49:00 AM
Hihi,
I have 2 problems.
--------------Question 1-----------------------
Firstly, a Ruby question. I'm confused about how to match a single
regular expression multiple times in a single string. For instance,
'llgllallo'.match(/(ll.)/)[0] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> nil
How do I access all 3 matches? String#scan will work, but that gives me
'llgllallo'.scan(/(ll.)/) #=> [["llg"], ["lla"], ["llo"]]
But I need the offsets, and this info isn't given to me.
--------------Question 2-----------------------
Now an old gap in my regex understanding. How do I exclude on
consecutive characters? I want something like [^abc], except aba or bbc
is ok, just not 'abc'. Summing this up:
reg = /something/
'abc'.match(reg) #-> no match
'cba'.match(reg) #-> match
And then I want to be able to do OR operations too, like not 'abc' and
not 'bbc', but that is probably another step of complexity.
I don't suppose there is any way to pass a block to the regex to use in
a specific place? That would be cool, though maybe not possible given
optimisations in regex?
Thanks in advance,
ben
--
Posted via
http://www.ruby-...
.
2 Answers
David A. Black
6/21/2008 8:00:00 AM
0
Hi --
On Sat, 21 Jun 2008, Ben Woodcroft wrote:
> Hihi,
>
> I have 2 problems.
>
> --------------Question 1-----------------------
> Firstly, a Ruby question. I'm confused about how to match a single
> regular expression multiple times in a single string. For instance,
>
> 'llgllallo'.match(/(ll.)/)[0] #-> 'llg'
> 'llgllallo'.match(/(ll.)/)[1] #-> 'llg'
> 'llgllallo'.match(/(ll.)/)[1] #-> nil
>
> How do I access all 3 matches? String#scan will work, but that gives me
>
> 'llgllallo'.scan(/(ll.)/) #=> [["llg"], ["lla"], ["llo"]]
>
> But I need the offsets, and this info isn't given to me.
You could do:
irb(main):029:0> offsets = []
=> []
irb(main):030:0> str.scan(/ll./) { offsets << $~.offset(0)[1] }
=> "llgllallo"
irb(main):031:0> offsets
=> [3, 6, 9]
(Pending someone coming up with something slicker. I don't like the
temp variable particularly, but anyway.)
> --------------Question 2-----------------------
> Now an old gap in my regex understanding. How do I exclude on
> consecutive characters? I want something like [^abc], except aba or bbc
> is ok, just not 'abc'. Summing this up:
[^abc] means: match one character that is not 'a', not 'b', and not
'c'. I don't think that's what you mean.
> reg = /something/
> 'abc'.match(reg) #-> no match
> 'cba'.match(reg) #-> match
>
> And then I want to be able to do OR operations too, like not 'abc' and
> not 'bbc', but that is probably another step of complexity.
You can use (?!), which is negative lookahead.
irb(main):033:0> reg = /(?!abc)[abc]{3}/
=> /(?!abc)[abc]{3}/
So that means: three of a, b, c, as long as we're not looking at
"abc" when we start looking for those three characters.
irb(main):034:0> reg.match("abc")
=> nil
irb(main):035:0> reg.match("abb")
=> #<MatchData:0x69de8>
irb(main):036:0> reg.match("cba")
=> #<MatchData:0x63de4>
> I don't suppose there is any way to pass a block to the regex to use in
> a specific place? That would be cool, though maybe not possible given
> optimisations in regex?
Blocks get passed to methods, not objects, and regexes are objects.
Some of the methods that use regexes also take blocks, like scan, sub,
and gsub. I'm not sure what you mean about the specific place, though.
David
--
Rails training from David A. Black and Ruby Power and Light:
ADVANCING WITH RAILS June 16-19 Berlin
ADVANCING WITH RAILS July 21-24 Edison, NJ
See
http://www.r...
for details and updates!
Ben Woodcroft
6/22/2008 1:52:00 AM
0
David A. Black wrote:
> You could do:
>
> irb(main):029:0> offsets = []
> => []
> irb(main):030:0> str.scan(/ll./) { offsets << $~.offset(0)[1] }
> => "llgllallo"
> irb(main):031:0> offsets
> => [3, 6, 9]
>
> (Pending someone coming up with something slicker. I don't like the
> temp variable particularly, but anyway.)
>
That will work, thanks. It would seem intuitive to me that scan (or a
method like it) would iterate of MatchData objects, but anyway. Thanks.
>> --------------Question 2-----------------------
>> Now an old gap in my regex understanding. How do I exclude on
>> consecutive characters? I want something like [^abc], except aba or bbc
>> is ok, just not 'abc'. Summing this up:
>
> [^abc] means: match one character that is not 'a', not 'b', and not
> 'c'. I don't think that's what you mean.
>
>> reg = /something/
>> 'abc'.match(reg) #-> no match
>> 'cba'.match(reg) #-> match
>>
>> And then I want to be able to do OR operations too, like not 'abc' and
>> not 'bbc', but that is probably another step of complexity.
>
> You can use (?!), which is negative lookahead.
>
> irb(main):033:0> reg = /(?!abc)[abc]{3}/
> => /(?!abc)[abc]{3}/
>
> So that means: three of a, b, c, as long as we're not looking at
> "abc" when we start looking for those three characters.
>
> irb(main):034:0> reg.match("abc")
> => nil
> irb(main):035:0> reg.match("abb")
> => #<MatchData:0x69de8>
> irb(main):036:0> reg.match("cba")
> => #<MatchData:0x63de4>
That is exactly what I meant. I was unaware of the negative lookahead
operator. Thanks!
>
>> I don't suppose there is any way to pass a block to the regex to use in
>> a specific place? That would be cool, though maybe not possible given
>> optimisations in regex?
>
> Blocks get passed to methods, not objects, and regexes are objects.
> Some of the methods that use regexes also take blocks, like scan, sub,
> and gsub. I'm not sure what you mean about the specific place, though.
>
My question was not explained very well, sorry. I meant it would be cool
if you could pass a block that became part of the regex itself. For
instance instead of /(?!abc)/ you could somehow tell it
{|s| s != 'abc'}
Just an idea, doesn't really matter now you've fixed my problem.
Thanks,
ben
>
> David
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Regex - Exclude Multiple Characters and Global Scanning
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password