Asp Forum - How to write a good hash method for my struct-like class?

Jeremy Henty

4/27/2006 2:04:00 PM

I have an RGB colour class which wraps a C struct { uchar r,g,b; } .
I have defined #eql? to be true if two objects have the same r, g and
b values. This means I have to define #hash as well, right? So
what's a good algorithm? XOR the r, g and b values? Or concatenate
them into a 24-bit number? Or something else?

Thanks in advance,

Jeremy Henty

7 Answers

coachhilton

4/28/2006 5:57:00 PM

Jeremy - this may not be too much help to you but I recall reading a
paper in the Communications of the ACM about the creation of an optimal
hash relative to a given input space. I can't recall the issue but it
was in the last 5 or so years. The paper was very interesting and
accessible so if you can find it perhaps it can guide you (although you
may have to join the ACM to ge a copy ;) Otherwise, I'd google the
topic.

Ken

Dale Martenson

4/28/2006 9:33:00 PM

You don't have to define #hash. It depends on the behavior you wish to
have. I like to think of it as:

#eql? and #== are used to check equivalence.
#equal? is used to check sameness as it relates to object space.
#hash is used to check sameness as it relates to Hash only.

That means that if you wish to treat equivalent objects as the same key
values for a Hash you must define #hash. Otherwise, they will be seen
as unique key values for the Hash.

Please correct me if I am wrong, but this is my understanding.

I think generating a 24-bit number is a fine idea. I even use that
method in my example below.

Here is a long boring example that attempts to illustrate some of the
differences:

class RGB1
attr_accessor :r, :g, :b
def initialize(r,g,b)
@r = r
@g = g
@b = b
end
end

class RGB2 < RGB1
def eql?(x)
((@r==x.r)&&(@g==x.g)&&(@b==x.b))
end
def ==(x)
((@r==x.r)&&(@g==x.g)&&(@b==x.b))
end
end

class RGB3 < RGB2
def hash
@r<<16|@g<<8|@b
end
end

def test_equivalence( class1, class2 )
puts "#{class1}.new(255,0,0).eql?(#{class2}.new(255,0,0)):
#{class1.new(255,0,0).eql?(class2.new(255,0,0))}"
puts "#{class1}.new(255,0,0) == #{class2}.new(255,0,0):
#{class1.new(255,0,0) == class2.new(255,0,0)}"
puts "#{class1}.new(255,0,0).equal?(#{class2}.new(255,0,0)):
#{class1.new(255,0,0).equal?(class2.new(255,0,0))}"
puts "-----------------------------------------------------------"
end

def test_hash( class1 )
puts "Using #{class1}:"
h = {}
h[class1.new(255,0,0)] = 10
h[class1.new(255,0,0)] = 20
h.each_pair {|k,v| puts "h[#{k}]: #{v}"}
puts "-----------------------------------------------------------"
end

test_equivalence( RGB1, RGB2 )
test_equivalence( RGB1, RGB1 )
test_equivalence( RGB2, RGB2 )
test_equivalence( RGB3, RGB3 )

test_hash(RGB2)
test_hash(RGB3)

++++++++++++++++ END OF CODE ++++++++++++++++

The output from the above would be:

RGB1.new(255,0,0).eql?(RGB2.new(255,0,0)): false
RGB1.new(255,0,0) == RGB2.new(255,0,0): false
RGB1.new(255,0,0).equal?(RGB2.new(255,0,0)): false
-----------------------------------------------------------
RGB1.new(255,0,0).eql?(RGB1.new(255,0,0)): false
RGB1.new(255,0,0) == RGB1.new(255,0,0): false
RGB1.new(255,0,0).equal?(RGB1.new(255,0,0)): false
-----------------------------------------------------------
RGB2.new(255,0,0).eql?(RGB2.new(255,0,0)): true
RGB2.new(255,0,0) == RGB2.new(255,0,0): true
RGB2.new(255,0,0).equal?(RGB2.new(255,0,0)): false
-----------------------------------------------------------
RGB3.new(255,0,0).eql?(RGB3.new(255,0,0)): true
RGB3.new(255,0,0) == RGB3.new(255,0,0): true
RGB3.new(255,0,0).equal?(RGB3.new(255,0,0)): false
-----------------------------------------------------------
Using RGB2:
h[#<RGB2:0x27db1e8>]: 10
h[#<RGB2:0x27d8d58>]: 20
-----------------------------------------------------------
Using RGB3:
h[#<RGB3:0x27d5248>]: 20
-----------------------------------------------------------

Robert Klemme

4/29/2006 11:55:00 AM

Dale Martenson <dale.martenson@gmail.com> wrote:
> You don't have to define #hash. It depends on the behavior you wish to
> have. I like to think of it as:
>
> #eql? and #== are used to check equivalence.
> #equal? is used to check sameness as it relates to object space.
> #hash is used to check sameness as it relates to Hash only.
>
> That means that if you wish to treat equivalent objects as the same
> key values for a Hash you must define #hash. Otherwise, they will be
> seen as unique key values for the Hash.
>
> Please correct me if I am wrong, but this is my understanding.

That's formally correct but I recommend to implement #hash, #eql? and #== at
the same time to make it consistent - even if the hash implementation is not
optimal or remains unused. Later you'll stuff objects into a Hash and
wonder why it won't work out the way you expected.

> I think generating a 24-bit number is a fine idea. I even use that
> method in my example below.

Yep, that's what I'd do in this case as well.

A much simpler approach is to just do

Color = Struct.new :r, :g, :b

and get #hash, #== and #eql? for free:

>> Color = Struct.new :r, :g, :b
=> Color
>> c1 = Color.new 1,2,3
=> #<struct Color r=1, g=2, b=3>
>> c2 = Color.new 1,2,3
=> #<struct Color r=1, g=2, b=3>
>> c1 == c2
=> true
>> c1.eql? c2
=> true
>> c1.hash == c2.hash
=> true

And you get even more:

>> c1.to_a
=> [1, 2, 3]
>> c1.members
=> ["r", "g", "b"]

Kind regards

robert

Jeremy Henty

4/30/2006 11:48:00 AM

On 2006-04-28, coachhilton@gmail.com <coachhilton@gmail.com> wrote:

> ... I recall reading a paper in the Communications of the ACM about
> the creation of an optimal hash relative to a given input space. I
> can't recall the issue but it was in the last 5 or so years. The
> paper was very interesting and accessible so if you can find it
> perhaps it can guide you (although you may have to join the ACM to
> ge a copy ;) Otherwise, I'd google the topic.

Yes, Google turns up lots of references to this sort of thing, but so
far I've found nothing I can download without joining various
organisations. I was really hoping to find a simple tip for writing
good hash functions for the hash package that Ruby uses internally.

Regards,

Jeremy Henty

Jeremy Henty

4/30/2006 12:29:00 PM

On 2006-04-29, Robert Klemme <bob.news@gmx.net> wrote:

> ... I recommend to implement #hash, #eql? and #== at the same time
> to make it consistent - even if the hash implementation is not
> optimal or remains unused. Later you'll stuff objects into a Hash
> and wonder why it won't work out the way you expected.

I agree, particularly as this code is going into Ruby-FLTK . I don't
want to inflict unexpected behaviour on users.

>> I think generating a 24-bit number is a fine idea. I even use that
>> method in my example below.
>
> Yep, that's what I'd do in this case as well.

I went for XOR-ing them together, so that each field affects the hash
in the "same way". Of course that means you have only 256 possible
hash values. Maybe this would affect really large hashes badly? I'm
guessing here, I've never dealt with these issues before.

What would be *really* nice is a way to instrument Ruby's hashes, then
throw real-life data at them and just see how it distributes itself
between the hash buckets.

> A much simpler approach is to just do
>
> Color = Struct.new :r, :g, :b
>
> and get #hash, #== and #eql? for free:

That's what I'd do in pure Ruby but the FLTK C API uses uchars. The
code is cleaner if the Ruby/FLTK color objects wrap a C struct { uchar
r, g, b; } . I wouldn't be doing this in C if I didn't have to!

> And you get even more:
>
>>> c1.to_a
>=> [1, 2, 3]
>>> c1.members
>=> ["r", "g", "b"]

Good point! But it's easy enough to add these methods by loading a
bit of Ruby after dl-ing the extension. (/me adds that to the TO DO
list.)

Thanks for all the suggestions,

Jeremy Henty

Florian Gross

5/1/2006 6:47:00 PM

Jeremy Henty wrote:

> I have an RGB colour class which wraps a C struct { uchar r,g,b; } .
> I have defined #eql? to be true if two objects have the same r, g and
> b values. This means I have to define #hash as well, right? So
> what's a good algorithm? XOR the r, g and b values? Or concatenate
> them into a 24-bit number? Or something else?

I'd just do [r, g, b].hash. :)

Jeremy Henty

5/1/2006 8:26:00 PM

On 2006-05-01, Florian Groß <flgr@ccan.de> wrote:
> Jeremy Henty wrote:
>
>> This means I have to define #hash as well, right? So what's a good
>> algorithm? XOR the r, g and b values? Or concatenate them into a
>> 24-bit number? Or something else?
>
> I'd just do [r, g, b].hash. :)

D'oh! <slap> Luckily the Release Notes state "the implementation may
change". :-)

Regards,

Jeremy Henty

comp.lang.ruby

How to write a good hash method for my struct-like class?

Jeremy Henty

coachhilton

Dale Martenson

Robert Klemme

Jeremy Henty

Jeremy Henty

Florian Gross

Jeremy Henty

x Login to ForumsZone