[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regexp gotcha

Pistos Christou

3/28/2006 3:24:00 PM

Hi, all. I was fixing a bug last night, and discovered some
"gotcha"-like behaviour in the process. Consider:

irb(main):173:0> s = "my string"
=> "my string"
irb(main):174:0> r1 = /my/
=> /my/
irb(main):175:0> r2 = /your/
=> /your/
irb(main):176:0> r3 = nil
=> nil
irb(main):177:0> s =~ r1
=> 0
irb(main):178:0> s =~ r2
=> nil
irb(main):179:0> s =~ r3
=> false

s =~ r1 .... That's cool, it gives me the index of the match.
s =~ r2 .... That's cool, it tells me there was no match.
s =~ r3 .... Whoa.

The reason this "got me" is that I had this code:

match_result = ( some_string =~ some_regexp )
if match_result != nil
# Assume there was a match
end

But the problem is... I had an s =~ r3 case because some_regexp was nil,
and so it was entering my if block when I semantically did not want that
to occur. :(

So now my code is

if match_result != nil and match_result != false
...
end

Note also that I can't even use Regexp.last_match != nil in my if block:

irb(main):224:0> s =~ r2
=> nil
irb(main):225:0> Regexp.last_match
=> nil
irb(main):226:0> s =~ r3
=> false
irb(main):227:0> Regexp.last_match
=> nil
irb(main):228:0> s =~ r1
=> 0
irb(main):229:0> Regexp.last_match
=> #<MatchData:0x406c40b4>
irb(main):230:0> s =~ r3
=> false
irb(main):231:0> Regexp.last_match
=> #<MatchData:0x406c40b4>

To be clear: Note how a "nil type" of non-match overwrites last_match,
but a "false type" of non-match doesn't.

So the question is... why are BOTH nil and false possible return values
of =~ ? Is there some benefit to this? Why not just one or the other?

I see that this behaviour is [documented][1] but I still feel that this
is unintuitive behaviour when people assume =~ only applies to Regexp
RHS's.

[1]: http://www.ruby-doc.org/core/classes/String.ht...

Thanks in advance for any and all clarifications and explanations.

Pistos

--
Posted via http://www.ruby-....


18 Answers

Victor 'Zverok' Shepelev

3/28/2006 3:29:00 PM

0

> So now my code is
>
> if match_result != nil and match_result != false
> ...
> end

AFAIK, it's an equivalent for simple

if match_result
...
end

because in Ruby only nil and false are "false", where any other value
(including 0, '' and []) are "true".

Victor.



dblack

3/28/2006 3:33:00 PM

0

Stefan Lang

3/28/2006 3:40:00 PM

0

You could just do this...

if string =~ /(\w)/
#do something with $1
end

anytime you try to match a string to a non-regexp object you get a
false, I think.

_Kevin


On Wednesday, March 29, 2006, at 12:24 AM, Pistos Christou wrote:
>Hi, all. I was fixing a bug last night, and discovered some
>"gotcha"-like behaviour in the process. Consider:
>
>irb(main):173:0> s = "my string"
>=> "my string"
>irb(main):174:0> r1 = /my/
>=> /my/
>irb(main):175:0> r2 = /your/
>=> /your/
>irb(main):176:0> r3 = nil
>=> nil
>irb(main):177:0> s =~ r1
>=> 0
>irb(main):178:0> s =~ r2
>=> nil
>irb(main):179:0> s =~ r3
>=> false
>
>s =~ r1 .... That's cool, it gives me the index of the match.
>s =~ r2 .... That's cool, it tells me there was no match.
>s =~ r3 .... Whoa.
>
>The reason this "got me" is that I had this code:
>
>match_result = ( some_string =~ some_regexp )
>if match_result != nil
> # Assume there was a match
>end
>
>But the problem is... I had an s =~ r3 case because some_regexp was nil,
>and so it was entering my if block when I semantically did not want that
>to occur. :(
>
>So now my code is
>
>if match_result != nil and match_result != false
>...
>end
>
>Note also that I can't even use Regexp.last_match != nil in my if block:
>
>irb(main):224:0> s =~ r2
>=> nil
>irb(main):225:0> Regexp.last_match
>=> nil
>irb(main):226:0> s =~ r3
>=> false
>irb(main):227:0> Regexp.last_match
>=> nil
>irb(main):228:0> s =~ r1
>=> 0
>irb(main):229:0> Regexp.last_match
>=> #<MatchData:0x406c40b4>
>irb(main):230:0> s =~ r3
>=> false
>irb(main):231:0> Regexp.last_match
>=> #<MatchData:0x406c40b4>
>
>To be clear: Note how a "nil type" of non-match overwrites last_match,
>but a "false type" of non-match doesn't.
>
>So the question is... why are BOTH nil and false possible return values
>of =~ ? Is there some benefit to this? Why not just one or the other?
>
>I see that this behaviour is [documented][1] but I still feel that this
>is unintuitive behaviour when people assume =~ only applies to Regexp
>RHS's.
>
>[1]: http://www.ruby-doc.org/core/classes/String.ht...
>
>Thanks in advance for any and all clarifications and explanations.
>
>Pistos
>
>--
>Posted via http://www.ruby-....
>





--
Posted with http://De.... Sign up and save your time!


Pistos Christou

3/28/2006 3:49:00 PM

0

Victor Shepelev wrote:
> if match_result
> ...
> end
>
> because in Ruby only nil and false are "false", where any other value
> (including 0, '' and []) are "true".

Yep, thank you to you and David. I forgot that I could rewrite it like
that.

Kevin wrote:
> You could just do this...
> if string =~ /(\w)/
> #do something with $1
> end

Well, in this particular case, I am using the Fixnum returned, which is
why I am making the assignment. I normally otherwise do as you say,
using "if string =~ /regexp/".

Pistos

--
Posted via http://www.ruby-....


Robert Klemme

3/28/2006 6:52:00 PM

0

Pistos Christou wrote:
> Victor Shepelev wrote:
>> if match_result
>> ...
>> end
>>
>> because in Ruby only nil and false are "false", where any other value
>> (including 0, '' and []) are "true".
>
> Yep, thank you to you and David. I forgot that I could rewrite it like
> that.
>
> Kevin wrote:
>> You could just do this...
>> if string =~ /(\w)/
>> #do something with $1
>> end
>
> Well, in this particular case, I am using the Fixnum returned, which is
> why I am making the assignment. I normally otherwise do as you say,
> using "if string =~ /regexp/".

Personally I prefer to use /rx/ =~ str over str =~ /rx/ - to me this
makes it clearer that the RX is the one that does the matching. Just
personal taste maybe but I think I also remember that that variant is a
tad faster.

Kind regards

robert

James Herdman

3/28/2006 8:50:00 PM

0

Are you asking why we can write

if match_result

as equivalent to

if match_result != nil and match_result != false

?

James H.

Pistos Christou

3/29/2006 3:45:00 PM

0

James H. wrote:
> Are you asking why we can write
>
> if match_result
>
> as equivalent to
>
> if match_result != nil and match_result != false
>
> ?

No, not at all. :) I'm just whining a bit that =~ can return both nil
and false. It's not that big a deal, but this is something that could
catch unaware people who would follow down the same tracks as I did and
assume match_result != nil would cover all the bases, when it doesn't.

Robert Klemme wrote:
> Personally I prefer to use /rx/ =~ str over str =~ /rx/ - to me this
> makes it clearer that the RX is the one that does the matching. Just
> personal taste maybe but I think I also remember that that variant is a
> tad faster.

I didn't even realize this could be done :) (though I see now that it is
documented, =~ being a synonym for Regexp#match). If I lived in a
vacuum for the last 15 years, and Ruby was the first and only
programming language I ever learned, then I would have done it that way,
too, from the very start. :) Alas, I came from [other languages and
then] Perl, so it was just a carry over to continue using str =~
/regexp/.

FWIW, we still have the same problem:

irb(main):255:0> r1 =~ s
=> 0
irb(main):256:0> r2 =~ s
=> nil
irb(main):257:0> r3 =~ s
=> false

I've taken note that you say that r =~ s is faster. I (or someone else)
will have to do some benchmarking to see whether that's really true, and
how much speed gain can be had. Diakonos suffers when you use large and
many regexps for syntax highlighting, so I'd be interested in anything
that can speed that up.

Pistos

--
Posted via http://www.ruby-....


Robert Dober

3/29/2006 4:26:00 PM

0

Pistos Christou

3/29/2006 4:41:00 PM

0

Robert Dober wrote:
> class String
> alias_method :__old_match, :=~
> def =~(obj)
> raise RuntimeError,
> "#{obj.nil? ? "nil" : obj.to_s} is not a Regexp but #{
> obj.class}" unless Regexp === obj
> __old_match obj
> end # def =~(obj)
> end # class String

Thanks for the suggestion, Robert.

While I have no qualms about extending core classes, this sort of
adjustment feels like too... "brash"? of a modification for me. :)

In this particular case, rewriting my if line is an acceptable solution
to the problem.

Pistos

--
Posted via http://www.ruby-....


dblack

3/29/2006 5:06:00 PM

0