Robert Klemme
1/2/2006 1:14:00 PM
kryglik <kryglik@iol.cz> wrote:
> Robert Klemme wrote:
>> kryglik <kryglik@iol.cz> wrote:
>>> Hello ruby people,
>>> I've tried to make some speed test in ruby to find out how fast is
>>> searching in hash (using has_key?)
>>>
>>> The results were OK but I discovered curious thing -- when i
>>> generate big hash the speed is different when i unserialize
>>> (marshal.load) the same hash.
>>>
>>> Unserialized Hash is 1.5x faster in has_key? method. Tried it with
>>> GC.disable as well, results were same.
>>>
>>> What am I missing?
>>>
>>> Thanx for any ideas
>>
>> The first reason that comes to mind is different insertion order.
>> But I hardly believe it could make such a big difference. Care to
>> post your testing code?
>>
>> Kind regards
>>
>> robert
>>
>
> Well, insertion order will be same, won't be?
Not necessarily. If Marshal.dump uses the current order in the hash for
writing then the insertion order during loading is likely to be different
from the original insertion order.
> h = {}
> h_slovo = []
>
> if not true #HERE YOU SWITCH IF GENERATE/LOAD
> for x in 1..40_000
> if x % 2000 == 0
> puts x
> end
>
> slovo = ""
>
> for z in 1..32
> slovo += (rand(25)+65).chr
> end
>
> if rand(10) == 3
> h_slovo << slovo
> end
>
> h.update slovo => rand(1_000_000)
> slovo = nil
> end
>
> f = File.new("hash.marshal","w+")
> f.puts(Marshal.dump(h))
> f.close
>
> f = File.new("hash1.marshal","w+")
> f.puts(Marshal.dump(h_slovo))
> f.close
> else
>
> puts "loading hash"
> h = Marshal.load(File.open("hash.marshal","r"))
> puts "loading hash1"
> h_slovo = Marshal.load(File.open("hash1.marshal","r"))
> puts "done"
> end
>
> t1 = Time.now
>
> for slovo1 in h_slovo
> h.has_key? h_slovo
> end
>
> t2 = Time.now
>
> puts "hash: " + h.size.to_s
> puts "hled: " + h_slovo.size.to_s
> puts t2-t1
Your time measurement is far too unprecise on a modern system. You'll have
to repeat lookups to get more accurate results. I get a difference between 5
and 10 percent with the attached code - regardless of GC enabled or
disabled. Strange though that the copy is always slower. This is an
interesting issue.
Note, if you freeze keys lookups in the original hash are much faster
because then hash keys are not copied on insertion (an optimisation of
unfrozen Strings as hash keys), i.e., during lookup if the key is the same
it's also the same object which will be the first test - and of course this
test is much faster than sequential comparison of characters.
Another note, a more realistic scenario would be if not 10% of all keys were
used for lookup but if there also were keys that are not present in the
hash.
Kind regards
robert