Phrogz
10/17/2007 3:50:00 PM
On Oct 17, 6:12 am, "Rick DeNatale" <rick.denat...@gmail.com> wrote:
> On 10/17/07, Paul Butcher <p...@texperts.com> wrote:
> > Just about every class in the standard library implements == and eql? as
> > I describe in the article, i.e. eql? tests for equal values and == tests
> > for "natural" equality (which normally means equal values).
>
> > Hash, however, is an exception. Hash#== tests for equal values.
> > Hash.eql?, however, tests for object identity:
>
> > Why is hash the odd one out? I'm sure that there must be a good reason
> > (Matz?) but I can't at the moment work out what it might be.
>
> I think the reason is twofold:
>
> 1) Using hashs as keys in another hash is not a common use case. I'm a
> little hard-pressed to think of why I'd want to, although I'm famous
> for lack of imagination.
I've wanted it on 3 occasions (that I can remember) now. Here's a
contrived example derived from the real-world use case I can no longer
remember:
You have a file like this...
alpha,beta,15
gamma,delta,3
beta,alpha,4
alpha,alpha,3
delta,alpha,5
gamma,delta,7
....and you want to sum up the numbers for each unique pair of greek
letters. Naively, I'd do (and initially tried) something like:
sums = Hash.new{ 0 }
DATA.each{ |line|
_, g1, g2, num = /(\w+),(\w+),(\d+)/.match( line ).to_a
sums[ { g1=>true, g2=>true } ] += num.to_i
}
I believe I instead resorted to sorting the keys and using a nested
hash to drill down to the value. It was annoying.
> 2) Because of the requirement that obj1.eql? obj2 => obj1.hash ==
> obj2.hash, implementing Hash#hash requires iterating over the keys and
> values and would be fairly expensive and make accessing a hash with
> hash keys by key impractical.
That logic seems slightly mothering, though. "Ruby prevents you from
doing A because if you did A it might be slow." Ruby doesn't prevent
me from writing:
my_huge_array.delete_if{ |v1|
my_huge_array.find{ |v2| (v1 - v2).abs < mu }
}
I suppose the distinction is that the above is a foolish pairing of
individually-reasonable parts, while Hash#hash is an atomic method
written to optimize speed for one (reasonably useless) use case at the
expense of allowing another use case.
As a related aside:
Having never written a hashing function, I'm uncertain how I'd write
Hash#hash in a way that reasonably prevented two hashes with different
keys and/or values from ending up with the same value. (Multiply
the .hash values of all keys and values in the Hash and then mod them
on a big prime number?) Has anyone taken a stab at implementing this?