Asp Forum - #sort_by and #sort_obj

Eric Hodel

10/7/2007 9:22:00 AM

I haven't seen this technique in the wild before.

If you have custom sorting using the spaceship:

class Something
def <=>(other)
[@name, @version] <=> [other.name, other.version]
end
end

You can't easily use the more-efficient #sort_by because you'll need
to know how to sort Somethings, which involves duplication:

somethings.sort_by { |s| [s.name, s.version] }

Instead you can add a method that returns the thing to sort by:

class Something
def sort_obj
[@name, @version]
end

def <=>(other)
sort_obj <=> other.sort_obj
end
end

So now using #sort_by is easy:

somethings.sort_by { |s| s.sort_obj }

I don't know if the name #sort_obj is appropriate, but I'm not coming
up with something better.

See also:

http://blog.segment7.net/articles/2007/10/07/sort_by-an...

--
Poor workers blame their tools. Good workers build better tools. The
best workers get their tools to do the work for them. -- Syndicate Wars

17 Answers

Trans

10/7/2007 10:35:00 AM

On Oct 7, 2:21 am, Eric Hodel <drbr...@segment7.net> wrote:
> I haven't seen this technique in the wild before.
>
> If you have custom sorting using the spaceship:
>
> class Something
> def <=>(other)
> [@name, @version] <=> [other.name, other.version]
> end
> end
>
> You can't easily use the more-efficient #sort_by because you'll need
> to know how to sort Somethings, which involves duplication:
>
> somethings.sort_by { |s| [s.name, s.version] }
>
> Instead you can add a method that returns the thing to sort by:
>
> class Something
> def sort_obj
> [@name, @version]
> end
>
> def <=>(other)
> sort_obj <=> other.sort_obj
> end
> end
>
> So now using #sort_by is easy:
>
> somethings.sort_by { |s| s.sort_obj }
>
> I don't know if the name #sort_obj is appropriate, but I'm not coming
> up with something better.
>
> See also:
>
> http://blog.segment7.net/articles/2007/10/07/sort_by-an...

Nice. In fact, it seems this could dethrone <=> as the primary method
Sortable depends on.

Perhaps a better name is #indice or #sort_index, or something like
that.

T.

David A. Black

10/7/2007 11:29:00 AM

David A. Black

10/7/2007 2:43:00 PM

Stefan Rusterholz

10/7/2007 4:41:00 PM

David A. Black wrote:
> Hi --
>
> On Sun, 7 Oct 2007, Eric Hodel wrote:
>
>> You can't easily use the more-efficient #sort_by because you'll need to know
>>
>> something better.
> Maybe #sort_key.
>
>
> David

I'd support that. I wrote a nat_cmp for String and added
String#nat_cmp_key to use sort_by.
It would be nice if the object returned allowed #-@ to be called, to
enable reverse sorting by easily doing: enum.sort_by { |o| -o.sort_key }

Regards
Stefan
--
Posted via http://www.ruby-....

Trans

10/7/2007 5:12:00 PM

On Oct 7, 7:42 am, "David A. Black" <dbl...@rubypal.com> wrote:
> I think Eric intends it to return a kind of sort key, not to actually
> specify the sorting algorithm.

I realize. But clearly a standard definition of <=> would be based on
it. So the "coupling" method of Sortable could be #sort_key instead of
#<=>, though of course one can still override #<=> if need be.

> > Perhaps a better name is #indice or #sort_index, or something like
> > that.
>
> I'm afraid "indice" isn't a word :-) The singular of indices is index.

Ah, well. I'm an American; dropping 's' is good enough for me ;)
#sort_key is a perfectly good name.

T.

ara.t.howard

10/7/2007 6:02:00 PM

On Oct 7, 2007, at 3:21 AM, Eric Hodel wrote:

> I don't know if the name #sort_obj is appropriate, but I'm not
> coming up with something better.

i prefer #order, #order_by since the concept is extends to much more
that sorting, for instance iterating or returning elements from a
collection such as a priority queue. in any case the idea is a good
one.

cheers.

a @ http://codeforp...
--
share your knowledge. it's a way to achieve immortality.
h.h. the 14th dalai lama

Rudi Cilibrasi

10/7/2007 6:23:00 PM

On 10/7/07, ara.t.howard <ara.t.howard@gmail.com> wrote:
>
> On Oct 7, 2007, at 3:21 AM, Eric Hodel wrote:
>
> > I don't know if the name #sort_obj is appropriate, but I'm not
> > coming up with something better.
>
> i prefer #order, #order_by since the concept is extends to much more
> that sorting, for instance iterating or returning elements from a
> collection such as a priority queue. in any case the idea is a good
> one.

its a great idea and i have been using it for years in sql by the
name "sortkey" although after reading ara's point i think i prefer the
term "orderkey" now.
another name in collation docs is "weight string". i think i like
weight_string the best because it is less verby and confrontational
than order_by and order. both of these are sort of "in your face"
sounding. cheers,
-r.

Joel VanderWerf

10/7/2007 9:00:00 PM

Eric Hodel wrote:
> somethings.sort_by { |s| s.sort_obj }

Since this is more efficient than #sort, maybe the implementation of
#sort should first try to do this, and only if that fails (#sort_obj not
defined, or two sort_obj not comparable), revert to the normal <=>
algorithm.

This would make a difference when some library does a #sort on some
collection, and you can't tell it to #sort_by instead (and you can't
redefine #sort in the collection, either).

In the cases where you have a collection of objects that don't respond
to #sort_obj, #sort will quickly hit a failure and use the fall-back.
(The worst case would be if, say, all but the last object in the
collection had #sort_obj.)

It should also fall back if <=> fails on the sort objs. I'm not sure how
to detect that reliably though.

Anyway, here's a half-baked attempt:

class NoSortObjMethod < StandardError; end

class Object
def sort_obj
raise NoSortObjMethod
end
end

class Array
alias old_sort sort
def sort
if block_given?
old_sort {|*args| yield(*args)}
else
begin
rslt = sort_by {|s| s.sort_obj}
puts "DEBUG: used sort_by with sort_obj"
rslt
rescue NoSortObjMethod
puts "DEBUG: NoSortObjMethod, using old_sort"
old_sort
rescue TypeError => ex # ?
puts "DEBUG: Comparison failed (#{ex}), using old_sort"
old_sort
end
end
end
end

a = [1, 3, 2]
p a
p a.sort

class C
def initialize n
@n = n
end

def sort_obj
@n
end
end

b = [C.new(1), C.new(3), C.new(2)]
p b
p b.sort

__END__

Output:

[1, 3, 2]
DEBUG: NoSortObjMethod, using old_sort
[1, 2, 3]
[#<C:0xb7c3d248 @n=1>, #<C:0xb7c3d234 @n=3>, #<C:0xb7c3d220 @n=2>]
DEBUG: used sort_by with sort_obj
[#<C:0xb7c3d248 @n=1>, #<C:0xb7c3d220 @n=2>, #<C:0xb7c3d234 @n=3>]

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Rick DeNatale

10/7/2007 9:21:00 PM

On 10/7/07, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:
> Eric Hodel wrote:
> > somethings.sort_by { |s| s.sort_obj }
>
> Since this is more efficient than #sort, maybe the implementation of
> #sort should first try to do this, and only if that fails (#sort_obj not
> defined, or two sort_obj not comparable), revert to the normal <=>
> algorithm.

Yes, but sort_by is not always more efficient than sort:

$ qri sort_by
----------------------------------------------------- Enumerable#sort_by
enum.sort_by {| obj | block } => array
------------------------------------------------------------------------
Sorts enum using a set of keys generated by mapping the values in
enum through the given block.

%w{ apple pear fig }.sort_by {|word| word.length}
#=> ["fig", "pear", "apple"]

The current implementation of sort_by generates an array of tuples
containing the original collection element and the mapped value.
This makes sort_by fairly expensive when the keysets are simple

require 'benchmark'
include Benchmark

a = (1..100000).map {rand(100000)}

bm(10) do |b|
b.report("Sort") { a.sort }
b.report("Sort by") { a.sort_by {|a| a} }
end

produces:

user system total real
Sort 0.180000 0.000000 0.180000 ( 0.175469)
Sort by 1.980000 0.040000 2.020000 ( 2.013586)

Also adding the startup overhead to check for an implementation of
sort_obj is an expense.

There's also the issue that you really need to check if ALL of the
elements being sorted respond to sort_obj.

It seems better to me to either reserve this technique as a pattern,
or at most add a new sort_by_sort_obj method or the like, and leave
sort as it is.

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denh...

Joel VanderWerf

10/7/2007 9:35:00 PM

Rick DeNatale wrote:
> On 10/7/07, Joel VanderWerf <vjoel@path.berkeley.edu> wrote:
>> Eric Hodel wrote:
>>> somethings.sort_by { |s| s.sort_obj }
>> Since this is more efficient than #sort, maybe the implementation of
>> #sort should first try to do this, and only if that fails (#sort_obj not
>> defined, or two sort_obj not comparable), revert to the normal <=>
>> algorithm.
>
> Yes, but sort_by is not always more efficient than sort:

Quite right, #sort should stay as it is, and #sort_by should only be
called explicitly.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

comp.lang.ruby

#sort_by and #sort_obj

Eric Hodel

Trans

David A. Black

David A. Black

Stefan Rusterholz

Trans

ara.t.howard

Rudi Cilibrasi

Joel VanderWerf

Rick DeNatale

Joel VanderWerf

x Login to ForumsZone