[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Curious regexp behavior

lewisd

2/16/2005 1:17:00 AM

On a whim, I just decided to try an experiment with regexps, to see how
they perform in two slightly different cases. I wanted to see how using
a single regexp object for many many evaluations performed compared to
using the regexp within the loop.

The scripts I wrote searched through a words file that is 234937 lines
long.

Here's the scripts I wrote, to clarify:
First one:

total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ /[a-df-h][aeiou]{2}/
}
}
puts total

Second one:

rexp = /[a-df-h][aeiou]{2}/
total = 0
File.open( 'words', 'r' ) { |file|
file.each_line { |line|
word = line.chomp
total +=1 if word =~ rexp
}
}
puts total


I expected the second one to be slightly faster, but was surprised to
see that it was actually slightly slower. I ran each one about 10-15
times, and eyeballed an average. The results from each run after the
first were pretty consistant.

It's just a curiosity, but does anyone know what might cause them to be
'backwards' like that? :)

--
Derek Lewis

===================================================================
Java Web-Application Developer

Email : email@lewisd.com
Cellular : 778.898.5825
Website : http://www....

"If you've got a 5000-line JSP page that has "all in one" support
for three input forms and four follow-up screens, all controlled
by "if" statements in scriptlets, well ... please don't show it
to me :-). Its almost dinner time, and I don't want to lose my
appetite :-)."
- Craig R. McClanahan


6 Answers

Charles Mills

2/16/2005 2:38:00 AM

0

Derek Lewis wrote:
> On a whim, I just decided to try an experiment with regexps, to see
how
> they perform in two slightly different cases. I wanted to see how
using
> a single regexp object for many many evaluations performed compared
to
> using the regexp within the loop.
>
> The scripts I wrote searched through a words file that is 234937
lines
> long.
>
> Here's the scripts I wrote, to clarify:
> First one:
>
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ /[a-df-h][aeiou]{2}/
> }
> }
> puts total
>
> Second one:
>
> rexp = /[a-df-h][aeiou]{2}/
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ rexp
> }
> }
> puts total
>
>
> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to
be
> 'backwards' like that? :)
>
I'll wager a guess. In the first version Ruby knows that
'/[a-df-h][aeiou]{2}/' is a regexp. In the second one Ruby doesn't
know if 'rexp' is a variable or method, so it has to do 1 maybe 2 look
ups on every interation before it dispatches String#=~.
Also regexp's are immutable so Ruby doesn't allocate a new regexp on
every interation and storing the regexp has no effect in that regard.

-Charlie

Eric Hodel

2/16/2005 5:30:00 AM

0

On 15 Feb 2005, at 17:16, Derek Lewis wrote:

> First one:
>
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ /[a-df-h][aeiou]{2}/
^^^^ inline regexp (part of the AST)
> }
> }
> puts total
>
> Second one:
>
> rexp = /[a-df-h][aeiou]{2}/
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ rexp
^^^^ variable lookup
> }
> }
> puts total
>
>
> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to be
> 'backwards' like that? :)

Inline regexps are much faster than a variable lookup then using the
methods on the Regexp object.

--
Eric Hodel - drbrain@segment7.net - http://se...
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Ryan Davis

2/16/2005 8:52:00 AM

0


On Feb 15, 2005, at 5:16 PM, Derek Lewis wrote:

> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to be
> 'backwards' like that? :)

Use ParseTree and you can see why!!!

<576> echo "a=/blah/; 's' =~ a" | parse_tree_show -f
(cut for readability)
[:lasgn, :a, [:lit, /blah/]],
[:call, [:str, "s"], :=~, [:array, [:lvar, :a]]]]]]]]
<577> echo "'s' =~ /blah/" | parse_tree_show -f
(cut for readability)
[:match3, [:lit, /blah/], [:str, "s"]]]]]]]

Basically, the inline regex avoids the lvar lookup and the call and
shoots straight into a match3 node. The lvar is probably not _that_
expensive, but method dispatch is not terribly cheap.

--
ryand-ruby@zenspider.com - http://blog.zens...
http://rubyforge.org/projec...
http://rubyforge.org/projects/...
http://www.zenspider.com/...



Robert Klemme

2/16/2005 9:12:00 AM

0


"Derek Lewis" <lewisd@f00f.net> schrieb im Newsbeitrag
news:20050216012200.GP23232@f00f.net...
> On a whim, I just decided to try an experiment with regexps, to see how
> they perform in two slightly different cases. I wanted to see how using
> a single regexp object for many many evaluations performed compared to
> using the regexp within the loop.
>
> The scripts I wrote searched through a words file that is 234937 lines
> long.
>
> Here's the scripts I wrote, to clarify:
> First one:
>
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ /[a-df-h][aeiou]{2}/
> }
> }
> puts total
>
> Second one:
>
> rexp = /[a-df-h][aeiou]{2}/
> total = 0
> File.open( 'words', 'r' ) { |file|
> file.each_line { |line|
> word = line.chomp
> total +=1 if word =~ rexp
> }
> }
> puts total
>
>
> I expected the second one to be slightly faster, but was surprised to
> see that it was actually slightly slower. I ran each one about 10-15
> times, and eyeballed an average. The results from each run after the
> first were pretty consistant.
>
> It's just a curiosity, but does anyone know what might cause them to be
> 'backwards' like that? :)

Did you try the same with the matching reversed, i.e., "rexp =~ word"
instead of "word =~ rexp"? Did it make a difference?

Kind regards

robert

William Morgan

2/16/2005 1:48:00 PM

0

Excerpts from Ryan Davis's mail of 16 Feb 2005 (EST):
> Use ParseTree and you can see why!!!
>
> <576> echo "a=/blah/; 's' =~ a" | parse_tree_show -f
> (cut for readability)
> [:lasgn, :a, [:lit, /blah/]],
> [:call, [:str, "s"], :=~, [:array, [:lvar, :a]]]]]]]]
> <577> echo "'s' =~ /blah/" | parse_tree_show -f
> (cut for readability)
> [:match3, [:lit, /blah/], [:str, "s"]]]]]]]

Very nice answer.

Like the original poster, I found the behavior counterintuitive. Perhaps
this is because our assumptions come from the C model of the universe,
where more local variables is typically faster, and method dispatch is
not a problem.

I wonder what the merits of collecting equivalences like these to form
some kind of post-hoc parse-tree optimization would be. Probably not
great, but it might be fun.

--
William <wmorgan-ruby-talk@masanjin.net>


lewisd

2/16/2005 5:35:00 PM

0

On Wed, Feb 16, 2005 at 06:14:52PM +0900, Robert Klemme wrote:
>
>
> Did you try the same with the matching reversed, i.e., "rexp =~ word"
> instead of "word =~ rexp"? Did it make a difference?
>
> Kind regards
>
> robert
>

I did, actually, and it was very slightly faster. Still slower than an
inline regexp, however.

Thanks for the insightful answers, everyone. It quite interesting to
find out how your favorite programming language works inside.

--
Derek Lewis

===================================================================
Java Web-Application Developer

Email : email@lewisd.com
Cellular : 778.898.5825
Website : http://www....

"If you've got a 5000-line JSP page that has "all in one" support
for three input forms and four follow-up screens, all controlled
by "if" statements in scriptlets, well ... please don't show it
to me :-). Its almost dinner time, and I don't want to lose my
appetite :-)."
- Craig R. McClanahan