[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Question on speed

Jesse Brown

8/18/2006 2:04:00 PM

Based on the following snippet:

File.open(name).each { |line|
case line
when /not found/, /^[gBbTi]/
next
when /^S/
# start or stop condition
when /^I/
# iteration indicator
else
# actual data
end
}


In the first when, (/not found/, /^[gBbTi]/) is there any benefit in
setting up the statement in either of the following ways?

when /^[gBbTi]/, /not found/
Put the short circuit one first (expected to happen more often than
'not found')

OR

when /^g/, /^B/, /^b/, /^T/, /^i/, /not found/
Separate each operation into it's own portion of the test (ordered
by expected frequency)

I guess this is really a question of 'how does ruby handle the multiple
arguments to a case statement?'.
Are they taken in order or all compiled to a single expression before
ever being evaluated?

Thanks for any insight.

3 Answers

Paul Battley

8/18/2006 2:43:00 PM

0

On 18/08/06, L7 <jesse.r.brown@gmail.com> wrote:
> In the first when, (/not found/, /^[gBbTi]/) is there any benefit in
> setting up the statement in either of the following ways?

Ruby doesn't compile the separate expressions: it sends ===(subject)
to each argument in turn. They don't have to be regular expressions,
after all. Putting the more frequent match first is indeed quicker, as
expected - but that's an optimisation that can only be made with
foreknowledge of the data set.

Separating each operation into its own argument is significantly
slower due to the Ruby method call overhead. Knowing this, however, we
can derive a further optimisation by combining the two regular
expressions together: /^[gBbTi]|not found/

Paul.

PS: I did a bit of quick unscientific profiling to check. Here's the code:

data = [
"goat",
"Badger",
"bear",
"Tiger",
"ibis",
"not found",
"Start",
"Data goes here"
] * 10000

t0 = Time.now
data.each do |line|
case line
when /not found/, /^[gBbTi]/
next
end
end
p Time.now - t0 # 0.236658

t0 = Time.now
data.each do |line|
case line
when /^[gBbTi]/, /not found/
next
end
end
p Time.now - t0 # 0.176375

t0 = Time.now
data.each do |line|
case line
when /^[gBbTi]|not found/
next
end
end
p Time.now - t0 # 0.145403

t0 = Time.now
data.each do |line|
case line
when /^g/, /^B/, /^b/, /^T/, /^i/, /not found/
next
end
end
p Time.now - t0 # 0.299182

Jesse Brown

8/18/2006 3:10:00 PM

0


Paul Battley wrote:
> On 18/08/06, L7 <jesse.r.brown@gmail.com> wrote:
> > In the first when, (/not found/, /^[gBbTi]/) is there any benefit in
> > setting up the statement in either of the following ways?
>
> Ruby doesn't compile the separate expressions: it sends ===(subject)
> to each argument in turn. They don't have to be regular expressions,
> after all. Putting the more frequent match first is indeed quicker, as
> expected - but that's an optimisation that can only be made with
> foreknowledge of the data set.
>
> Separating each operation into its own argument is significantly
> slower due to the Ruby method call overhead. Knowing this, however, we
> can derive a further optimisation by combining the two regular
> expressions together: /^[gBbTi]|not found/
>
> Paul.
>
> PS: I did a bit of quick unscientific profiling to check. Here's the code:
>
> data = [
> "goat",
> "Badger",
> "bear",
> "Tiger",
> "ibis",
> "not found",
> "Start",
> "Data goes here"
> ] * 10000
>
> t0 = Time.now
> data.each do |line|
> case line
> when /not found/, /^[gBbTi]/
> next
> end
> end
> p Time.now - t0 # 0.236658
>
> t0 = Time.now
> data.each do |line|
> case line
> when /^[gBbTi]/, /not found/
> next
> end
> end
> p Time.now - t0 # 0.176375
>
> t0 = Time.now
> data.each do |line|
> case line
> when /^[gBbTi]|not found/
> next
> end
> end
> p Time.now - t0 # 0.145403
>
> t0 = Time.now
> data.each do |line|
> case line
> when /^g/, /^B/, /^b/, /^T/, /^i/, /not found/
> next
> end
> end
> p Time.now - t0 # 0.299182


Thank you for the explanation and example (it's plenty 'un'scientific
for my needs).

David Vallner

8/18/2006 7:54:00 PM

0

Paul Battley wrote:
> p Time.now - t0 # 0.236658

Shh. Don't tell anyone, but there's the benchmark module 'bm'. Pickaxe
has a good introduction on how to use it, and it prints pwetty result
tables. And comes with a convenient method to do a dry and "real" run to
let GC have a chance to kick in.

David Vallner