Asp Forum - Super-scalar Optimizations

Gavin Kistner

6/16/2005 10:58:00 PM

I was looking over the shoulder of a C++ coworker yesterday, when he
was writing a hack to only run certain code once. The C++ code was the
equivalent of the following Ruby code:

already_run = false
while true
foo = bar if !already_run
already_run = true
do_something( )
end

I asked him: "Wouldn't it be slightly faster to nest the boolean
assignment inside the if statement?" I was suggesting the equivalent
of:

already_run = false
while true
if !already_run
foo = bar
already_run = true
end
do_something( )
end

His answer was "no", and involved discussion of super-scalar
architectures and the fact that the original way ran the assignment in
parallel to the condition evaluation, and so was in fact faster.

The reason I'm posting is - are there any such considerations in Ruby
(when writing Ruby code, not C/C++ components)?

Or am I correct in assuming that the current state of
compilation/interpretation is such that there is no parallel branching
of statements to worry about?

13 Answers

ptkwt

6/16/2005 11:31:00 PM

In article <1118962660.509611.111140@f14g2000cwb.googlegroups.com>,
Phrogz <gavin@refinery.com> wrote:
>I was looking over the shoulder of a C++ coworker yesterday, when he
>was writing a hack to only run certain code once. The C++ code was the
>equivalent of the following Ruby code:
>
> already_run = false
> while true
> foo = bar if !already_run
> already_run = true
> do_something( )
> end
>
>I asked him: "Wouldn't it be slightly faster to nest the boolean
>assignment inside the if statement?" I was suggesting the equivalent
>of:
>
> already_run = false
> while true
> if !already_run
> foo = bar
> already_run = true
> end
> do_something( )
> end
>
>His answer was "no", and involved discussion of super-scalar
>architectures and the fact that the original way ran the assignment in
>parallel to the condition evaluation, and so was in fact faster.
>
>
>The reason I'm posting is - are there any such considerations in Ruby
>(when writing Ruby code, not C/C++ components)?
>
>Or am I correct in assuming that the current state of
>compilation/interpretation is such that there is no parallel branching
>of statements to worry about?
>

I'll take a stab at this, but I'm no expert....

Your cow-worker _may_ be right about the differences between the two code
snippets. I would think that you'd have to know what kind of assembly
code got generated by the two different snippets, though. Perhaps he's
taken a look at the compiled code. It also seems like it could differ a
lot between compilers (g++ vs. VC++).

As far as Ruby code goes, I don't think you would see any difference
because Ruby doesn't get compiled to native code (yet ;-). Though there
may be differences in how the interpretter handles the two different
snippets which could possibly effect speed, it would have nothing to do
with anything deep down in the actual hardware processor.

Phil

Robert Klemme

6/17/2005 7:08:00 AM

Phil Tomson wrote:
> In article <1118962660.509611.111140@f14g2000cwb.googlegroups.com>,
> Phrogz <gavin@refinery.com> wrote:
>> I was looking over the shoulder of a C++ coworker yesterday, when he
>> was writing a hack to only run certain code once. The C++ code was
>> the equivalent of the following Ruby code:
>>
>> already_run = false
>> while true
>> foo = bar if !already_run
>> already_run = true
>> do_something( )
>> end
>>
>> I asked him: "Wouldn't it be slightly faster to nest the boolean
>> assignment inside the if statement?" I was suggesting the equivalent
>> of:
>>
>> already_run = false
>> while true
>> if !already_run
>> foo = bar
>> already_run = true
>> end
>> do_something( )
>> end
>>
>> His answer was "no", and involved discussion of super-scalar
>> architectures and the fact that the original way ran the assignment
>> in parallel to the condition evaluation, and so was in fact faster.
>>
>>
>> The reason I'm posting is - are there any such considerations in Ruby
>> (when writing Ruby code, not C/C++ components)?
>>
>> Or am I correct in assuming that the current state of
>> compilation/interpretation is such that there is no parallel
>> branching of statements to worry about?
>>
>
> I'll take a stab at this, but I'm no expert....
>
> Your cow-worker _may_ be right about the differences between the two
> code snippets. I would think that you'd have to know what kind of
> assembly code got generated by the two different snippets, though.
> Perhaps he's taken a look at the compiled code. It also seems like
> it could differ a lot between compilers (g++ vs. VC++).
>
> As far as Ruby code goes, I don't think you would see any difference
> because Ruby doesn't get compiled to native code (yet ;-). Though
> there may be differences in how the interpretter handles the two
> different snippets which could possibly effect speed, it would have
> nothing to do with anything deep down in the actual hardware
> processor.

.... especially as Ruby does no parallelism internally (no native threads).

My 0.02EUR...

robert

Devin Mullins

6/17/2005 12:11:00 PM

Remembering, vaguely, my comp arch class, I'm pretty sure the co-worker
(or cow-worker, if you really don't like him) was talking not about
threading, but about how some VLIW (very long instruction word, i.e. not
32-bit) machines include a 'predicate' in addition to the instruction
(i.e. MOV FOO, BAR IF EQL ALREADY_RUN, 0 is one instruction). I presume,
then, you guys are not running on x86.

And no, Ruby's too high level, and yeah, not compiled.

And wouldn't it be faster still just to pull foo = bar out of the while
loop? :)

Devin

Robert Klemme wrote:

>Phil Tomson wrote:
>
>
>>I'll take a stab at this, but I'm no expert....
>>
>>Your cow-worker _may_ be right about the differences between the two
>>code snippets. I would think that you'd have to know what kind of
>>assembly code got generated by the two different snippets, though.
>>Perhaps he's taken a look at the compiled code. It also seems like
>>it could differ a lot between compilers (g++ vs. VC++).
>>
>>As far as Ruby code goes, I don't think you would see any difference
>>because Ruby doesn't get compiled to native code (yet ;-). Though
>>there may be differences in how the interpretter handles the two
>>different snippets which could possibly effect speed, it would have
>>nothing to do with anything deep down in the actual hardware
>>processor.
>>
>>
>
>.... especially as Ruby does no parallelism internally (no native threads).
>
>My 0.02EUR...
>
> robert
>
>

Robert Klemme

6/17/2005 12:37:00 PM

Devin Mullins wrote:
> Remembering, vaguely, my comp arch class, I'm pretty sure the
> co-worker (or cow-worker, if you really don't like him) was talking
> not about threading, but about how some VLIW (very long instruction
> word, i.e. not 32-bit) machines include a 'predicate' in addition to
> the instruction (i.e. MOV FOO, BAR IF EQL ALREADY_RUN, 0 is one
> instruction). I presume, then, you guys are not running on x86.
>
> And no, Ruby's too high level, and yeah, not compiled.
>
> And wouldn't it be faster still just to pull foo = bar out of the
> while loop? :)

Even more so: what's the point of a loop that is alway run only once?

robert

>
> Devin
>
> Robert Klemme wrote:
>
>> Phil Tomson wrote:
>>
>>
>>> I'll take a stab at this, but I'm no expert....
>>>
>>> Your cow-worker _may_ be right about the differences between the two
>>> code snippets. I would think that you'd have to know what kind of
>>> assembly code got generated by the two different snippets, though.
>>> Perhaps he's taken a look at the compiled code. It also seems like
>>> it could differ a lot between compilers (g++ vs. VC++).
>>>
>>> As far as Ruby code goes, I don't think you would see any difference
>>> because Ruby doesn't get compiled to native code (yet ;-). Though
>>> there may be differences in how the interpretter handles the two
>>> different snippets which could possibly effect speed, it would have
>>> nothing to do with anything deep down in the actual hardware
>>> processor.
>>>
>>>
>>
>> .... especially as Ruby does no parallelism internally (no native
>> threads).
>>
>> My 0.02EUR...
>>
>> robert

Gavin Kistner

6/17/2005 2:27:00 PM

On Jun 17, 2005, at 6:40 AM, Robert Klemme wrote:
> Even more so: what's the point of a loop that is alway run only once?

Er, the loop doesn't run once, only the initialization code. The loop
runs forever.

On Jun 17, 2005, at 6:11 AM, Devin Mullins wrote:
> And wouldn't it be faster still just to pull foo = bar out of the
> while loop? :)

I actually flubbed the example slightly. It should have been:

already_run = false
while true
do_something( )
foo = bar if !already_run
already_run = true
end

Where "do_something()" was actually about 15 lines of code. The
alternative would have been:

do_something( )
foo = bar
while true
do_something( )
end

which is not very DRY when do_something( ) is a large block of code.

But again, even the programmer himself called it a hack while writing
it; at the time we weren't even sure if setting foo=bar after the
first iteration was the right fix to the problem.

Ara.T.Howard

6/17/2005 3:03:00 PM

Ben Giddings

6/17/2005 4:18:00 PM

On Thursday 16 June 2005 19:00, Phrogz wrote:
> His answer was "no", and involved discussion of super-scalar
> architectures and the fact that the original way ran the assignment in
> parallel to the condition evaluation, and so was in fact faster.

Yay! Trying to outsmart a compiler!

This sure seems like premature optimization to me. Was it really slowing
things down to do it the more obvious way? Had that been proven using a
profiler?

Computer code is a language that is meant to be read by both humans and
computers. These days, computers are really smart and their compilers can
look at the code and know what you're trying to do. In Ruby, Matz does
this by looking at context when something could be interpreted different
ways. C/C++ compilers can often spot common control structures and use an
optimized version in the machine code they produce.

Since computers are so smart, these days it makes more sense to write code
that a human can understand. Unless you truly need to clarify things for
the computer (i.e. things run too slow when they're written in the
human-obvious way) don't write for the computer!

Ben

Gavin Kistner

6/18/2005 12:26:00 AM

On Jun 17, 2005, at 9:03 AM, Ara.T.Howard wrote:
>> do_something( )
>> foo = bar
>> while true
>> do_something( )
>> end
>>
>> which is not very DRY when do_something( ) is a large block of code.
>>
>
> sure it is - block being the key word here:

Er, we've wandered far from the original content - the above Ruby
code was simply an illustration of the C++ code in question, because
I can't be bothered to know how to write proper C++ syntax.

Yes, Ruby makes life far cooler than C++.

Gavin Kistner

6/18/2005 12:42:00 AM

On Jun 17, 2005, at 10:18 AM, Ben Giddings wrote:
> On Thursday 16 June 2005 19:00, Phrogz wrote:
>> His answer was "no", and involved discussion of super-scalar
>> architectures and the fact that the original way ran the
>> assignment in
>> parallel to the condition evaluation, and so was in fact faster.
>>
>
> Yay! Trying to outsmart a compiler!
>
> This sure seems like premature optimization to me. Was it really
> slowing
> things down to do it the more obvious way? Had that been proven
> using a
> profiler?

I appreciate your comments, but in the defense of my coworker:
1) As I've stated, using the boolean flag to run the code once was
only a hack to test if the solution would fix, and

2) No, I doubt that the placement of a single boolean assignment made
any measurable difference either way. My point with this thread
(which has been answered) was simply to find out if Ruby had any
similar things to keep in mind that would flow down to the
instruction pipeline architecture. The placement of that assignment
in the C++ code

> Since computers are so smart, these days it makes more sense to
> write code
> that a human can understand. Unless you truly need to clarify
> things for
> the computer (i.e. things run too slow when they're written in the
> human-obvious way) don't write for the computer!

FWIW, I don't think that the difference between:

if ( !foo )
{
bar( );
foo = true;
}

versus

if ( !foo )
{
bar( );
}
foo = true;

makes a difference either way in terms of legibility. Being against
premature optimization is fine to a point, but in any programming
project there are numerous basic choices one can make which will
affect performance.

Matthias Georgi

6/18/2005 9:50:00 AM

Gavin Kistner schrieb:
> 2) No, I doubt that the placement of a single boolean assignment made
> any measurable difference either way. My point with this thread
> (which has been answered) was simply to find out if Ruby had any
> similar things to keep in mind that would flow down to the
> instruction pipeline architecture. The placement of that assignment
> in the C++ code
>

I just searched google for super-scalar optimizations regarding
interpreters and found a paper about java bytecode-interpreters:
http://www.csc.uvic.ca/~csc586a/papers/p58...

It seems, that the frequent memory access of interpreters prevent
super-scalar processors from branch-predicting and parallel-execution.

So you may assume, that almost no parallel execution happens in a ruby
script execution.

Besides that, I was always wondering, if there are performance issues
with procs. Given the fact, that they hold a reference to the c-stack,
maybe a proc call would result in some kind of stack restoring. This is
also the reason for the enormous memory consumption of continuations,
which store the whole c-stack(about 60kb).

comp.lang.ruby

Super-scalar Optimizations

Gavin Kistner

ptkwt

Robert Klemme

Devin Mullins

Robert Klemme

Gavin Kistner

Ara.T.Howard

Ben Giddings

Gavin Kistner

Gavin Kistner

Matthias Georgi

x Login to ForumsZone