Asp Forum - Re: sorting by rand?

seebs

4/30/2007 7:11:00 AM

In message <4635948C.5050803@blackkettle.org>, Alex Young writes:
>Peter Seebach wrote:
>> In message <46358854.30705@cesmail.net>, "M. Edward (Ed) Borasky" writes:
>>> Welll ... I do this in Excel all the time -- add a column of random
>>> numbers to the right of the (CSV) file I want to shuffle, sort it about
>>> five times, hitting F9 before each sort to change the random numbers,
>>> and then deleting the random column and saving the file. Translate that
>>> to Ruby and I think you've got it.

>> It seems expensive.

>That's almost exactly what sort_by{rand} does...

Right.

And so far as I can tell, sort_by{rand} is inefficient compared to
a canonical shuffle algorithm. That'd make sense; it's going to do a lot
of compares, and some fair number of them (I'd guess half, or close to
it?) result in swaps. A shuffle does N swaps, or 2N assignments. A
sort might do more. Sorting a list of 10,000 items, sorted by rand,
performed about 128,000 comparisons. If half of them result in swaps,
that's 64k swaps.

I can't think of any way in which sort_by{rand} is going to be "more random"
than a correct shuffle algorithm (yes, I know the one I posted the first time
is not that correct algorithm; I didn't do swaps, just assignment).

I also can't see an immediate reason to do the random numbers sort more
than once; I guess to reduce the chances of collisions preserving order.

-s

1 Answer

Dan Zwell

4/30/2007 7:24:00 AM

Peter Seebach wrote:
>>> It seems expensive.
>
>> That's almost exactly what sort_by{rand} does...
>
> Right.
>
> And so far as I can tell, sort_by{rand} is inefficient compared to
> a canonical shuffle algorithm.
That's true, but keep in mind that sort_by{rand} is code that's built
into the interpreter, so it's still gonna be pretty fast. I tested this
on my system a few weeks ago, and rerunning the benchmark program, the
traditional sort only becomes faster when the arrays in question have
over 5,000 elements. After all, it's a better algorithm, but it's
written in ruby and not C.

> I also can't see an immediate reason to do the random numbers sort more
> than once; I guess to reduce the chances of collisions preserving order.
If the numbers that rand() generates are good enough, the chance of
collisions preserving order should be the same as the chance of the next
iteration of randomizing putting them back in the same order, shouldn't it?

Dan

comp.lang.ruby

Re: sorting by rand?

seebs

Dan Zwell

x Login to ForumsZone