Asp Forum - OpenStruct problem

minkoo.seo@gmail.com

2/11/2006 9:37:00 AM

OpenStruct class seems to be misbehave when it comes to hashing.

For example,
irb(main):008:0> o1 = OpenStruct.new
=> <OpenStruct>
irb(main):009:0> o1.val = "a"
=> "a"
irb(main):010:0> o1 = OpenStruct.new
=> <OpenStruct>
irb(main):011:0> o1.foo = "a"
=> "a"
irb(main):012:0> o1.bar = "b"
=> "b"
irb(main):013:0> o2 = OpenStruct.new
=> <OpenStruct>
irb(main):014:0> o2.foo = "a"
=> "a"
irb(main):015:0> o2.bar = "b"
=> "b"
irb(main):016:0> h = { o1 => 1 }
=> {<OpenStruct bar="b" foo="a">=>1}
irb(main):017:0> h[o2]
=> nil
irb(main):018:0>

If it's not possible to retrieve the same hash entry using
OpenStruct, what should I have to use instead?

Don't you think OpenStruct have to support eql? method
automatically?

Best,
Minkoo Seo

10 Answers

Timothy Goddard

2/11/2006 10:07:00 AM

Hashes don't behave well as keys either so it's no surprise openstructs
(which are based on them) don't. Nested hashes are usually sufficient,
but I agree the hashing of hashes and openstructs is pretty strange
strange. If you really need to use openstructs as keys you could use:

class OpenStruct
def hash
@table.to_a.hash
end
end

This is, however, very inefficient and depends on all variables stored
in the openstruct also being good keys. As a general rule, a key should
be a simple value. Try to avoid using complex structures as keys.

Gavin Kistner

2/11/2006 10:18:00 AM

> As a general rule, a key should be a simple value.
> Try to avoid using complex structures as keys.

Or rather, if you are going to use objects as keys, think of them as
each *instance* being a unique key, not "this object happens to have
the same contents as that other one, so I think that they should act
like the same key".

As Timothy notes: you *can* do what you're asking for, but it's
reasonably inefficient (from a CPU standpoint, if not a
programmer-enabling one).

minkoo.seo@gmail.com

2/11/2006 11:14:00 AM

What I want to do with OpenStruct is to store bigram in hash table.
(bigram can be thought as two consecutive strings)

So, I have two choices. First one is to define new class. The other
option is to use OpenStruct. However, OpenStruct can't be used
as key of hash, so I made my own class.

At first, I thought that OpenStruct might be implemented by
adding methods and instance variables on the fly. But, it's
hashtable. Also, it doesn't have automatic hash, eql? method
generation.

I understand that computing good hash value is beyond
the scope of langauge design, but I'm talking about library.
OpenStruct might use one of the field for computing hash
key, stating in the docuement that "default implementation
is not efficient enough. If you want, override hash."

If such a implementaion is available, people like me, who is
developing a prototype sysmte, will gladly use the default.

That being said, what's the benefit of OpenStruct? If OpenStruct
is nothing but a hashtable with syntactic sugar, I see reason
of using it. Am I missing something?

Best,
Minkoo Seo

Erik Veenstra

2/11/2006 11:22:00 AM

That's because the hash (I mean the result of #hash) of both
OpenStruct objects differ.

$ ri Object#hash

.... gives you:

"Generates a +Fixnum+ hash value for this object. This function
must have the property that +a.eql?(b)+ implies +a.hash ==
b.hash+. The hash value is used by class +Hash+. Any hash value
that exceeds the capacity of a +Fixnum+ will be truncated
before being used."

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

require "ostruct"

o1 = OpenStruct.new
o1.foo = "a"
o1.bar = "b"

o2 = OpenStruct.new
o2.foo = "a"
o2.bar = "b"

p o1.hash
p o2.hash

----------------------------------------------------------------

George Ogata

2/11/2006 12:06:00 PM

"Timothy Goddard" <interfecus@gmail.com> writes:

> Hashes don't behave well as keys either so it's no surprise openstructs
> (which are based on them) don't. Nested hashes are usually sufficient,
> but I agree the hashing of hashes and openstructs is pretty strange
> strange. If you really need to use openstructs as keys you could use:
>
> class OpenStruct
> def hash
> @table.to_a.hash
> end
> end

Careful! @table.to_a is ordered arbitrarily, so that hash function
won't quite work. Better sort that array, or use another hashing
strategy.

@table.to_a.sort_by{|k,v| k.to_i}.hash

Robert Klemme

2/11/2006 8:06:00 PM

2006/2/11, Minkoo Seo <minkoo.seo@gmail.com>:
> What I want to do with OpenStruct is to store bigram in hash table.
> (bigram can be thought as two consecutive strings)
>
> So, I have two choices. First one is to define new class. The other
> option is to use OpenStruct. However, OpenStruct can't be used
> as key of hash, so I made my own class.
>
> At first, I thought that OpenStruct might be implemented by
> adding methods and instance variables on the fly. But, it's
> hashtable. Also, it doesn't have automatic hash, eql? method
> generation.
>
> I understand that computing good hash value is beyond
> the scope of langauge design, but I'm talking about library.
> OpenStruct might use one of the field for computing hash
> key, stating in the docuement that "default implementation
> is not efficient enough. If you want, override hash."
>
> If such a implementaion is available, people like me, who is
> developing a prototype sysmte, will gladly use the default.
>
> That being said, what's the benefit of OpenStruct? If OpenStruct
> is nothing but a hashtable with syntactic sugar, I see reason
> of using it. Am I missing something?

Probably not. That being said, an OpenStruct does not fit well as a
hash key because a hash key should be immutable (containers like Array
and Hash typically calculate their hash based on elements and so the
hash code will change once the container changes).

In Your case the best solution is to use a Struct.

Bigram = Struct.new :s1, :s2

You get #hash and #eql? for free and it's a one liner. This seems to
be much more appropriate than using an OpenStruct as the number of
fields is fixed, isn't it? An OpenStruct really only makes sense when
the number of fields changes frequently or if you do not not all the
fields beforehand and find Hash a bit too clumnsy. It's often used for
configuration data (like in the sample code of OptionParser).

Kind regards

robert

--
Have a look: http://www.flickr.com/photos/fu...

Timothy Goddard

2/11/2006 9:11:00 PM

You're right. The order depends on the order keys were added (although
not directly). This makes it even more inefficient and completely
impractical for normal use. It would be really useful to have a more
meaningful hash function for hash objects, but this is probably
difficult in practice as it uses a 3rd party hash table library.

Hal E. Fulton

2/11/2006 11:05:00 PM

Timothy Goddard wrote:
> You're right. The order depends on the order keys were added (although
> not directly). This makes it even more inefficient and completely
> impractical for normal use. It would be really useful to have a more
> meaningful hash function for hash objects, but this is probably
> difficult in practice as it uses a 3rd party hash table library.
>

Your comments about order lost me, can you clarify?

Thanks,
Hal

Timothy Goddard

2/12/2006 10:12:00 AM

I meant to say "The .hash (of the hash) depends on the order keys were
added...", but on second thoughts I think I was wrong. Two identical
hashes seem to have different .hash values even when the keys are added
in the same order. This obviously makes them completely unsuitable as
hash keys.

George Ogata

2/12/2006 12:03:00 PM

"Timothy Goddard" <interfecus@gmail.com> writes:

> I meant to say "The .hash (of the hash) depends on the order keys were
> added...", but on second thoughts I think I was wrong. Two identical
> hashes seem to have different .hash values even when the keys are added
> in the same order. This obviously makes them completely unsuitable as
> hash keys.

Hash does not provide a #hash method at all; it falls back to
Object#hash, which is essentially #object_id.

But you'd be correct to infer that inserting keys in a hash, `h', in a
different order could yield a different value for `h.to_a.hash'.
You'll probably see the difference with:

h1 = {}
h2 = {}
(1..1000).each{|i| h1[i] = true}
(1..1000).sort_by{rand}.each{|i| h2[i] = true}

p h1.to_a.hash == h2.to_a.hash
p h1.to_a.sort.hash == h2.to_a.sort.hash

The reason is that if two keys hit the same bucket, then which ever
one was inserted last will appear at the head of the chain (ruby
hashes use chaining). #each (and #to_a) simply scan each chain in
turn and yields each element as it finds it. Thus insertion order can
easily have an effect on the order of the array, and hence the array's
#hash.

As for usefulness, that will always depend on the application.
Claiming it as "completely impractical for normal use" sounds akin to
premature optimization to me, unless you're referring to a specific
use case which you know well. You could also use a faster hash
function which doesn't depend on the order, such as XORing values
computed from each pair.

I agree though, that a standard Hash#hash could be useful. It's come
up before; I think the issue was how to handle the #default and
#default_proc, especially the latter. I would probably favor using
default_proc.object_id in the computed hash, even if it means two
default_procs that always yield the same value end up giving different
hashes.

comp.lang.ruby

OpenStruct problem

minkoo.seo@gmail.com

Timothy Goddard

Gavin Kistner

minkoo.seo@gmail.com

Erik Veenstra

George Ogata

Robert Klemme

Timothy Goddard

Hal E. Fulton

Timothy Goddard

George Ogata

x Login to ForumsZone