[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Newbie question > Hash workings

AD60

2/28/2009 9:16:00 PM

Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

h = Hash.new([])
h[0] = [1,2,3]
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20] ???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ????????


Grtz. André

10 Answers

Stefan Lang

2/28/2009 10:29:00 PM

0

2009/2/28 AD60 <andredeen@gmail.com>:
> Hi,
>
> I am puzzled by the workings a the following program. I placed the
> output after the corresponding lines. To me it lookes like a sort of
> bug in the workings of de p method. Can anybody set me straight?

There's no bug involved.

> h = Hash.new([])

Here you create Hash with an empty array as
default value. For every lookup of a key that is
not in the hash, this single array instance is
returned, but not automatically stored in the hash.
Again: It's a single object, not a new one on every
lookup.

> h[0] = [1,2,3]

Here you associate 0 with an array. The hash default
value is not involved.

> p h #> {0=>[1, 2, 3]}
> h[0] += [4]
> p h #> {0=>[1, 2, 3, 4]}

This is a syntactic shortcut for "h[0] = h[0] + [4]".
It looks up the value of 0 in the hash, which is
an array. array + array creates a new array,
which overwrites the previous value of h[0].

> h[1] += [10,11]
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}

Again, this is a shortcut for "h[1] = h[1] + [10, 11]".
The hash doesn't have a value for 1 yet, so
the default value ([]) is returned. [] + [10, 11] is
the new array [10, 11] which is then assigned
to h[1].

> h[1] << 12
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}

Look up the existing array stored in h[1] and append
12. No new array created.

> h[2] << 20
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12] ?

The hash doesn't have a value for 2, so the default
value, [], is returned. The default value is destructively
modified by appending 20. _No new hash entry is
created (there is no assignment!)._

Future lookups of non-existent keys will return
the modified default value!

> p h[2] #> [20] ???

The hash still doesn't have an entry for 2, so
the default value is returned, which by now is
our modified array.

> h[2] << 21
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]} ?

Still no entry for 2, default value is returned and modified
by appending 21.

> h[2] += [22]
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20, 21, 22]} ????????

Again, this is equivalent to "h[2] = h[2] + [22]".
Still no entry for h[2], so the default value is returned,
which by now is [20, 21]. Then the two arrays
[20, 21] and [22] are concatenated, returning the
new array [20, 21, 22] which is then finally associated
with 2 in the hash (see: here is an assignment!).

What you probably want to achieve can be
done using Hash's block constructor:

h = Hash.new { |hash, key| hash[key] = [] }

This automatically stores a new empty array
on every lookup of a non-existent key.

HTH,
Stefan

Robert Klemme

2/28/2009 10:33:00 PM

0

On 28.02.2009 22:15, AD60 wrote:
> I am puzzled by the workings a the following program. I placed the
> output after the corresponding lines. To me it lookes like a sort of
> bug in the workings of de p method. Can anybody set me straight?

p works correctly but...

> h = Hash.new([])
> h[0] = [1,2,3]
> p h #> {0=>[1, 2, 3]}
> h[0] += [4]
> p h #> {0=>[1, 2, 3, 4]}
> h[1] += [10,11]
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
> h[1] << 12
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
> h[2] << 20
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
> 12] ?
> p h[2] #>
> [20] ???
> h[2] << 21
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
> 12]} ?
> h[2] += [22]
> p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
> 21, 22]} ????????

.... you are modifying the default value of the Hash in those cases where
you place the question marks. The default value is the one returned in
case a key does not exist. But there is no automatic insertion, that's
why you do not see key 2 after doing h[2] << 20.

You probably rather want this famous idiom:

h = Hash.new {|ha,ke| ha[ke] = []}

Now run your tests again. :-)

Kind regards

robert

AD60

3/1/2009 12:38:00 AM

0

Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

By doing "h[2] << 20" I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
[]}" and just doing "p h[2]" before any assigment to h[2] has the
surprising/logical side effect of creating a new key "2", thank god
with value "[]" so no problems there, but still... I am now thinking
that stating "h = Hash.new([])" is problematic idiom under all
circumstances. Am I correct?

Kind regards,
André

Robert Klemme

3/1/2009 10:46:00 AM

0

2009/3/1 AD60 <andredeen@gmail.com>:
> Thank you both for seeing the light at the end of the tunnel. I fully
> understand what I have to do to get it working but just to clarify;

Good!

> By doing "h[2] << 20" I modified the original empty array I set as a
> default value and inserted no hash key because it is not an assignment
> to h?

Yes. Although "assignment to h" is a bit ambiguous. I guess you
meant the correct thing but keep in mind that normally "assignment to
h" means "h = something" and not "h[key] = something". The latter is
actually a method defined in class Hash.

> Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
> []}" and just doing "p h[2]" before any assigment to h[2] has the
> surprising/logical side effect of creating a new key "2", thank god
> with value "[]" so no problems there, but still... I am now thinking
> that stating "h = Hash.new([])" is problematic idiom under all
> circumstances. Am I correct?

No, because it works well in other cases - the most popular probably being

counters = Hash.new 0
...
counters[foo] += 1

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end

AD60

3/1/2009 11:35:00 AM

0


I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Just to detail the difference:


hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]
p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

hiv = Hash.new 0
hiv[0] += 1
p hiv # {0=>1}
p hiv[1] # 0
p hiv # {0=>1}

it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Kind regards,
André

Robert Klemme

3/1/2009 3:54:00 PM

0

On 01.03.2009 12:35, AD60 wrote:
> I see the use with an integer default, I specifically meant the
> Hash.new([]) syntax as problematic, but it probably all depends on
> knowing what your doing.

Ah. Since you talked about "idiom" I thought you were referring to the
general pattern.

> Just to detail the difference:
>
>
> hav = Hash.new { |h,k| h[k] = []}
> hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.

> p hav # {0=>[1]}
> p hav[1] # []
> p hav # {0=>[1], 1=>[]}
>
> hiv = Hash.new 0
> hiv[0] += 1
> p hiv # {0=>1}
> p hiv[1] # 0
> p hiv # {0=>1}
>
> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
> had me wondering. Anyway thanks again, I understand how to use it and
> that's more than enough for now.

Good.

Kind regards

robert

Adam Gardner

3/2/2009 6:18:00 PM

0

Robert Klemme wrote:
> On 01.03.2009 12:35, AD60 wrote:
>> I see the use with an integer default, I specifically meant the
>> Hash.new([]) syntax as problematic, but it probably all depends on
>> knowing what your doing.
>
> Ah. Since you talked about "idiom" I thought you were referring to the
> general pattern.
>
>> Just to detail the difference:
>>
>>
>> hav = Hash.new { |h,k| h[k] = []}
>> hav[0] += [1]
>
> Actually this is very inefficient. With this idiom you should rather do
>
> hav[0] << [1]
>
> because otherwise you'll end up creating and deleting Array instances
> all the time without any benefit.
>
>> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
>> had me wondering. Anyway thanks again, I understand how to use it and
>> that's more than enough for now.
>
> Good.
>
> Kind regards
>
> robert

The method I would use is this:

hav = Hash.new {[]}

This gives you the benefit of creating a new array when necessary
(rather than a single default array which can be destructively modified
in troubling ways), but means that you only get permanent effects if you
explicitly use assignment, so you don't get this

p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

Of course, it means you can't use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.

--
Posted via http://www.ruby-....

Robert Klemme

3/2/2009 8:50:00 PM

0

On 02.03.2009 19:18, Adam Gardner wrote:
> Robert Klemme wrote:
>> On 01.03.2009 12:35, AD60 wrote:
>>> I see the use with an integer default, I specifically meant the
>>> Hash.new([]) syntax as problematic, but it probably all depends on
>>> knowing what your doing.
>> Ah. Since you talked about "idiom" I thought you were referring to the
>> general pattern.
>>
>>> Just to detail the difference:
>>>
>>> hav = Hash.new { |h,k| h[k] = []}
>>> hav[0] += [1]
>> Actually this is very inefficient. With this idiom you should rather do
>>
>> hav[0] << [1]
>>
>> because otherwise you'll end up creating and deleting Array instances
>> all the time without any benefit.
>>
>>> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
>>> had me wondering. Anyway thanks again, I understand how to use it and
>>> that's more than enough for now.
>> Good.

> The method I would use is this:
>
> hav = Hash.new {[]}
>
> This gives you the benefit of creating a new array when necessary
> (rather than a single default array which can be destructively modified
> in troubling ways), but means that you only get permanent effects if you
> explicitly use assignment, so you don't get this
>
> p hav # {0=>[1]}
> p hav[1] # []
> p hav # {0=>[1], 1=>[]}
>
> Of course, it means you can't use the << operator to stick values into
> non-present keys, but I rarely use the << operator anyway.

You probably should get rid of that habit. And here's why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage - or you have to explicitly test for
presence of the key, which is less efficient than letting Hash's C
implementation do it. It is much more elegant to just always do
hash[key] << value.

If you just want to make sure that you always get an Array back, you
could as well use Hash.new([].freeze) to avoid accidentally appending to
the single default all the time.

But it's a free country so if you like to stick with your habits...

Cheers

robert

Simon Krahnke

3/5/2009 1:47:00 PM

0

* Robert Klemme <shortcutter@googlemail.com> (2009-03-02) schrieb:

> You probably should get rid of that habit. And here's why: your
> approach has no means to efficiently work with Arrays as Hash values.
> Either, you have to use += all the time which is inefficient because it
> will create a lot of garbage

The problem is creating a lot of useless objects and copying over data,
not leaving garbage behind.

mfg, simon .... l

Robert Klemme

3/5/2009 4:58:00 PM

0

On 05.03.2009 14:47, Simon Krahnke wrote:
> * Robert Klemme <shortcutter@googlemail.com> (2009-03-02) schrieb:
>
>> You probably should get rid of that habit. And here's why: your
>> approach has no means to efficiently work with Arrays as Hash values.
>> Either, you have to use += all the time which is inefficient because it
>> will create a lot of garbage
>
> The problem is creating a lot of useless objects and copying over data,
> not leaving garbage behind.

Well, both (creation of objects as well as garbage collection left
overs) contribute to wasted CPU and thus slower execution.

robert