[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Sane #hash implementation?

Shot (Piotr Szotkowski)

1/29/2007 10:38:00 PM

I have a Set subclass, Block. I need two blocks to be considered the
same by Set iff they have the same elements. From what I gathered, Set
treats two objects as equal when they are eql? and have the same hash.

Implementing Block#eql? seems to be trivial, as Set#== is there to help:

class Block < Set
def eql? other
self == other
end
end

I’m stuck when it comes to Block#hash, though; I need these to be true:
Block.new.hash == Block.new.hash
Block.new([1,2]).hash == Block.new([1,2]).hash

What’s the proper way to go at it? Adding the elements’ hashes together
is obviously wrong, as four different elements might produce the same
sum when paired; also, the docs say, ‘any hash value that exceeds the
capacity of a Fixnum will be truncated before being used,’ so hash can’t
exceed the (word - 1 bit) size.

-- Shot
--
Hell hath no fury like a bureaucrat scorned. -- Milton Friedman
17 Answers

Ken Bloom

1/30/2007 1:33:00 AM

0

On Tue, 30 Jan 2007 07:38:20 +0900, Shot (Piotr Szotkowski) wrote:

> I have a Set subclass, Block. I need two blocks to be considered the
> same by Set iff they have the same elements. From what I gathered, Set
> treats two objects as equal when they are eql? and have the same hash.
>
> Implementing Block#eql? seems to be trivial, as Set#== is there to help:
>
> class Block < Set
> def eql? other
> self == other
> end
> end
>
> Iâ??m stuck when it comes to Block#hash, though; I need these to be true:
> Block.new.hash == Block.new.hash
> Block.new([1,2]).hash == Block.new([1,2]).hash
>
> Whatâ??s the proper way to go at it? Adding the elementsâ?? hashes together
> is obviously wrong, as four different elements might produce the same
> sum when paired; also, the docs say, â??any hash value that exceeds the
> capacity of a Fixnum will be truncated before being used,â?? so hash canâ??t
> exceed the (word - 1 bit) size.
>
> -- Shot

There's nothing about hash codes that requires that
a.hash != b.hash if a != b
You want to avoid collisions as much as possible, but
that's an issue of performance, not correctness, so you'd be perfectly
correct in adding up the hash codes of the members to create the has code.

Nor is there anything that requires you to keep your hash code within the
capacity of a Fixnum when you generate it. That's why it's automatically
truncated for you. The only reason you need to know is so that your hash
function doesn't hash things in such a way that the truncated number has
lots of collisions. (Once again a performance issue).

You may want to look at http://www.partow.net/programming/hash...
which discusses some relatively important hash functions. You may have to
experiment to see what works best. Wikipedia may also have good
information.

--Ken

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...

Eric Hodel

1/30/2007 4:49:00 AM

0

On Jan 29, 2007, at 14:38, Shot (Piotr Szotkowski) wrote:
> I have a Set subclass, Block. I need two blocks to be considered the
> same by Set iff they have the same elements. From what I gathered, Set
> treats two objects as equal when they are eql? and have the same hash.
>
> I’m stuck when it comes to Block#hash, though; I need these to be
> true:
> Block.new.hash == Block.new.hash
> Block.new([1,2]).hash == Block.new([1,2]).hash

Try:

class Block
def hash
to_a.hash
end
end

Shot (Piotr Szotkowski)

1/30/2007 7:23:00 AM

0

Ken Bloom:

> There's nothing about hash codes that requires that
> a.hash != b.hash if a != b
> You want to avoid collisions as much as possible, but that's an issue
> of performance, not correctness, so you'd be perfectly correct in
> adding up the hash codes of the members to create the has code.

True, but I do want to be able to key Hashes with my Blocks, so I do
want to avoid collisions as much as possible. I should’ve tested the
add-up-the-members’-hashes approach before/instead of crossing it out,
though.

> You may want to look at
> http://www.partow.net/programming/hash...
> which discusses some relatively important hash functions.
> You may have to experiment to see what works best. Wikipedia
> may also have good information.

Thanks for the pointers! I knew about hash functions before, I just
wasn’t sure whether there’s an idiomatic/popular/proper Ruby approach
to writing #hash.

-- Shot
--
I am a computer. I am dumber than any human and smarter than any administrator.

Shot (Piotr Szotkowski)

1/30/2007 7:26:00 AM

0

Eric Hodel:

> On Jan 29, 2007, at 14:38, Shot (Piotr Szotkowski) wrote:

>> I’m stuck when it comes to Block#hash, though; I need these to be true:
>> Block.new.hash == Block.new.hash
>> Block.new([1,2]).hash == Block.new([1,2]).hash

> Try:

> class Block
> def hash
> to_a.hash
> end
> end

Thanks a lot, Eric! This is the ‘d’oh!’ solution I was looking for. :)

-- Shot
--
Of course I can see iso-8859-1 characters, I'm French. -- Christian Marillat

Eric Hodel

1/30/2007 8:11:00 AM

0

On Jan 29, 2007, at 23:26, Shot (Piotr Szotkowski) wrote:
> Eric Hodel:
>> On Jan 29, 2007, at 14:38, Shot (Piotr Szotkowski) wrote:
>>> I’m stuck when it comes to Block#hash, though; I need these to be
>>> true:
>>> Block.new.hash == Block.new.hash
>>> Block.new([1,2]).hash == Block.new([1,2]).hash
>
>> Try:
>
>> class Block
>> def hash
>> to_a.hash
>> end
>> end
>
> Thanks a lot, Eric! This is the ‘d’oh!’ solution I was looking for. :)

You may want to define #eql? in terms of #to_a as well...

Mauricio Fernández

1/30/2007 8:20:00 AM

0

On Tue, Jan 30, 2007 at 05:10:51PM +0900, Eric Hodel wrote:
> On Jan 29, 2007, at 23:26, Shot (Piotr Szotkowski) wrote:
> >Eric Hodel:
> >>On Jan 29, 2007, at 14:38, Shot (Piotr Szotkowski) wrote:
> >>>I’m stuck when it comes to Block#hash, though; I need these to be
> >>>true:
> >>>Block.new.hash == Block.new.hash
> >>>Block.new([1,2]).hash == Block.new([1,2]).hash
> >
> >>Try:
> >
> >>class Block
> >> def hash
> >> to_a.hash
> >> end
> >>end
> >
> >Thanks a lot, Eric! This is the ‘d’oh!’ solution I was looking for.
> >:)
>
> You may want to define #eql? in terms of #to_a as well...

That would fail for the same reason as your Block#hash.

--
Mauricio Fernandez - http://eige... - singular Ruby

Pit Capitain

1/30/2007 8:21:00 AM

0

Shot (Piotr Szotkowski) schrieb:
> Eric Hodel:
>
>> On Jan 29, 2007, at 14:38, Shot (Piotr Szotkowski) wrote:
>
>>> I?m stuck when it comes to Block#hash, though; I need these to be true:
>>> Block.new.hash == Block.new.hash
>>> Block.new([1,2]).hash == Block.new([1,2]).hash
>
>> Try:
>
>> class Block
>> def hash
>> to_a.hash
>> end
>> end
>
> Thanks a lot, Eric! This is the ?d?oh!? solution I was looking for. :)

Shot, you should be aware that this works for your example given above,
but not for the following:

Block.new([1,12]).hash # => 57
Block.new([12,1]).hash # => 23

If this is a problem, you have to change the implementation to

class Block
def hash
sort.hash
end
end

Regards,
Pit

Mauricio Fernández

1/30/2007 11:49:00 AM

0

[I'm resending this since the ML server is dropping some of my messages to
this thread for some reason.]

On Tue, Jan 30, 2007 at 04:26:18PM +0900, Shot (Piotr Szotkowski) wrote:
> Eric Hodel:
>
> > On Jan 29, 2007, at 14:38, Shot (Piotr Szotkowski) wrote:
>
> >> I'm stuck when it comes to Block#hash, though; I need these to be true:
> >> Block.new.hash == Block.new.hash
> >> Block.new([1,2]).hash == Block.new([1,2]).hash
>
> > Try:
>
> > class Block
> > def hash
> > to_a.hash
> > end
> > end
>
> Thanks a lot, Eric! This is the d'oh! solution I was looking for. :)

It's not a correct one though.

require 'set'
a = Set.new(0..1000)
b = Set.new((0..1000).sort_by{rand})
a == b # => true
a.to_a.hash # => 31141761
b.to_a.hash # => 374826672

(The order of the elements in Hash#to_a can change.)


Try this, it's O(n ln n) but actually works

require 'set'
class Block < Set
def hash
map{|x| x.hash}.sort.hash
end
end
a = Block.new(0..1000)
b = Block.new((0..1000).sort_by{rand})
a == b # => true
a.hash # => 944434529
b.hash # => 944434529


--
Mauricio Fernandez - http://eige... - singular Ruby

Shot (Piotr Szotkowski)

1/31/2007 9:37:00 AM

0

Eric Hodel:

> You may want to define #eql? in terms of #to_a as well...

Thanks, but (considering Mauricio Fernandez’s fix) what would be the
benefit of defining #eql? in terms of Set#sort instead of Set#==?

Te docs say on Set#==, ‘Returns true if two sets are equal. The equality
of each couple of elements is defined according to Object#eql?.’

-- Shot
--
One of the advantages of being disorderly is that one is
constantly making exciting discoveries. -- A. A. Milne

Shot (Piotr Szotkowski)

1/31/2007 10:17:00 AM

0

Mauricio Fernandez:

> (The order of the elements in Hash#to_a can change.)

Thanks for pointing this out, I fixed my specs and my code.

> Try this, it's O(n ln n) but actually works

Ok, I guess I have a question on what does exactly Benchmark results
mean. Doesn’t the below mean hash_map_sort is actually a bit slower than
hash_sort?

Also, to ask two basic, follow-up questions: which Benchmark time is
the most significant one and is it more-or-less independent from other
system load? (I guess the two runs wouldn’t differ as much any of the
times was other-load-independent…)



$ cat block_hash.rb
require 'benchmark'
require 'set'

class Block < Set
def hash_sort
sort.hash
end
def hash_map_sort
map {|a| a.hash}.sort.hash
end
end

srand 1979
array = (0..1_000_000).sort_by {rand}

Benchmark.bmbm do |b|
b.report 'hash_sort' do
Block.new(array).hash_sort
end
b.report 'hash_map_sort' do
Block.new(array).hash_map_sort
end
end



$ ruby block_hash.rb
Rehearsal -------------------------------------------------
hash_sort 8.000000 1.366667 9.366667 ( 6.208513)
hash_map_sort 9.750000 1.500000 11.250000 ( 7.179749)
--------------------------------------- total: 20.616667sec

user system total real
hash_sort 9.666667 1.100000 10.766667 ( 6.677047)
hash_map_sort 9.566667 1.550000 11.116667 ( 6.809388)



$ ruby block_hash.rb
Rehearsal -------------------------------------------------
hash_sort 8.133333 1.316667 9.450000 ( 5.746571)
hash_map_sort 9.650000 1.766667 11.416667 ( 7.065570)
--------------------------------------- total: 20.866667sec

user system total real
hash_sort 9.316667 1.100000 10.416667 ( 6.684088)
hash_map_sort 9.450000 1.566667 11.016667 ( 6.796299)



-- Shot
--
It is a good morning exercise for a research scientist
to discard a pet hypothesis every day before breakfast.
It keeps him young. -- Konrad Lorenz