[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

How can I monitor a Regexp?

Daniel DeLorme

2/28/2007 5:55:00 AM

Because a regular expression can have different behaviors depending on its kcode
(e.g. behavior of \w) I decided that all my code should specify the kcode
explicitly (e.g. /\w+/n instead /\w+/). So I tried to set up some hooks to
monitor the creation of each Regexp and raise an exception if the kcode is
missing. Like this:

class Regexp
alias old_initialize initialize
def initialize(*args)
old_initialize(*args)
raise "NO KCODE!" if kcode.nil?
end
end

And it works fine if I use Regexp.new, but in the majority of cases the regexp
is expressed as a literal and the initialize is NOT EXECUTED.
> Regexp.new("foobar")
RuntimeError: NO KCODE!
> /foobar/
=> /foobar/

So I tried an alternate approach and set the hook into the =~ operator, but same
problem; the method override is completely ignored:
class String; def =~(o); raise "S"; end; end
class Regexp; def =~(o); raise "R"; end; end
"bar" =~ /bar/ #=> 0
/foo/ =~ "foo" #=> 0

So... anyone has any idea how I can tackle that problem?

4 Answers

Robert Dober

2/28/2007 10:26:00 AM

0

On 2/28/07, Daniel DeLorme <dan-ml@dan42.com> wrote:
> Because a regular expression can have different behaviors depending on its kcode
> (e.g. behavior of \w) I decided that all my code should specify the kcode
> explicitly (e.g. /\w+/n instead /\w+/). So I tried to set up some hooks to
> monitor the creation of each Regexp and raise an exception if the kcode is
> missing. Like this:
>
> class Regexp
> alias old_initialize initialize
> def initialize(*args)
> old_initialize(*args)
> raise "NO KCODE!" if kcode.nil?
> end
> end
>
> And it works fine if I use Regexp.new, but in the majority of cases the regexp
> is expressed as a literal and the initialize is NOT EXECUTED.
> > Regexp.new("foobar")
> RuntimeError: NO KCODE!
> > /foobar/
> => /foobar/
>
> So I tried an alternate approach and set the hook into the =~ operator, but same
> problem; the method override is completely ignored:
> class String; def =~(o); raise "S"; end; end
> class Regexp; def =~(o); raise "R"; end; end
> "bar" =~ /bar/ #=> 0
> /foo/ =~ "foo" #=> 0
>
> So... anyone has any idea how I can tackle that problem?
>
>
Yes, well no, I had one, but prospects look bleak now, look at this

robert@swserver:/home/svn 11:49:44
555/56 > ruby -r profile -e 'puts /a/'
(?-mix:a)
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.00 0.00 0.00 2 0.00 0.00 IO#write
0.00 0.00 0.00 1 0.00 0.00 Regexp#to_s
0.00 0.00 0.00 1 0.00 0.00 Kernel.puts
0.00 0.01 0.00 1 0.00 10.00 #toplevel
robert@swserver:/home/svn 11:49:50
556/57 > ruby -r profile -e 'puts Regexp.new("a")'
(?-mix:a)
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.00 0.00 0.00 2 0.00 0.00 IO#write
0.00 0.00 0.00 1 0.00 0.00 Kernel.puts
0.00 0.00 0.00 1 0.00 0.00 Regexp#initialize
0.00 0.00 0.00 1 0.00 0.00 Class#new
0.00 0.00 0.00 1 0.00 0.00 Regexp#to_s
0.00 0.01 0.00 1 0.00 10.00 #toplevel

I just do not see any way to intercept on Ruby level, you would need
to hack ruby itself.
Maybe someone more clever than me?

Cheers
Robert

--
We have not succeeded in answering all of our questions.
In fact, in some ways, we are more confused than ever.
But we feel we are confused on a higher level and about more important things.
-Anonymous

Guest

2/28/2007 10:37:00 AM

0

ruby -v
# ==> ruby 1.8.4 (2005-12-24) [i486-linux]

class String; def =~(o); raise "S"; end; end
class Regexp; def =~(o); raise "R"; end; end

r = /x/
r =~ 'a'
# ==> RuntimeError: R
from (irb):2:in `=~'
from (irb):4
'a' =~ r
# ==> RuntimeError: S
from (irb):1:in `=~'
from (irb):5

--
Posted via http://www.ruby-....

Daniel DeLorme

2/28/2007 11:57:00 AM

0

Jan Friedrich wrote:
> ruby -v
> # ==> ruby 1.8.4 (2005-12-24) [i486-linux]
>
> class String; def =~(o); raise "S"; end; end
> class Regexp; def =~(o); raise "R"; end; end
>
> r = /x/
> r =~ 'a'
> # ==> RuntimeError: R
> from (irb):2:in `=~'
> from (irb):4
> 'a' =~ r
> # ==> RuntimeError: S
> from (irb):1:in `=~'
> from (irb):5

Very interesting. If you assign the regexp to a variable you get
the overridden methods. I guess there's some voodoo optimization
at work when you use =~ on a regexp literal?

Daniel

Daniel DeLorme

2/28/2007 1:16:00 PM

0

Daniel DeLorme wrote:
> Because a regular expression can have different behaviors depending on
> its kcode (e.g. behavior of \w) I decided that all my code should
> specify the kcode explicitly (e.g. /\w+/n instead /\w+/).

As an addendum, I was wondering why \w matches extended characters in utf8.
If extended characters are considered "word" characters, does it mean they
are valid for identifiers? So I tried:

> $KCODE='u'
=> "u"
> def ���
> "nihongo"
> end
=> nil
> ���
=> "nihongo"

wow. O_O

Daniel