seebs
4/30/2007 7:11:00 AM
In message <4635948C.5050803@blackkettle.org>, Alex Young writes:
>Peter Seebach wrote:
>> In message <46358854.30705@cesmail.net>, "M. Edward (Ed) Borasky" writes:
>>> Welll ... I do this in Excel all the time -- add a column of random
>>> numbers to the right of the (CSV) file I want to shuffle, sort it about
>>> five times, hitting F9 before each sort to change the random numbers,
>>> and then deleting the random column and saving the file. Translate that
>>> to Ruby and I think you've got it.
>> It seems expensive.
>That's almost exactly what sort_by{rand} does...
Right.
And so far as I can tell, sort_by{rand} is inefficient compared to
a canonical shuffle algorithm. That'd make sense; it's going to do a lot
of compares, and some fair number of them (I'd guess half, or close to
it?) result in swaps. A shuffle does N swaps, or 2N assignments. A
sort might do more. Sorting a list of 10,000 items, sorted by rand,
performed about 128,000 comparisons. If half of them result in swaps,
that's 64k swaps.
I can't think of any way in which sort_by{rand} is going to be "more random"
than a correct shuffle algorithm (yes, I know the one I posted the first time
is not that correct algorithm; I didn't do swaps, just assignment).
I also can't see an immediate reason to do the random numbers sort more
than once; I guess to reduce the chances of collisions preserving order.
-s