Asp Forum - Performance of ruby hashes

unni.tallman

10/18/2006 12:47:00 PM

Anyway to enhance the performance of ruby hashes? My program uses
hashes heavily. So if i could enhance hash performance, or find a
fasster alternative to hashes, i could improve the overall performance
of my program. Gimme some suggestions.

12 Answers

Robert Klemme

10/18/2006 1:50:00 PM

On 18.10.2006 14:47, unni.tallman wrote:
> Anyway to enhance the performance of ruby hashes? My program uses
> hashes heavily. So if i could enhance hash performance, or find a
> fasster alternative to hashes, i could improve the overall performance
> of my program. Gimme some suggestions.

Hashes *are* pretty fast. Please state what actual performance problem
you have. Otherwise nobody will really be able to help you.

One reason for a slow hash can be a slow hash function or a hash
function that creates badly distributed values. That cannot be
attributed to the Hash implementation but to the implementation of your
hash keys.

robert

Tomasz Wegrzanowski

10/18/2006 2:46:00 PM

On 10/18/06, unni.tallman <unni.tallman@gmail.com> wrote:
> Anyway to enhance the performance of ruby hashes? My program uses
> hashes heavily. So if i could enhance hash performance, or find a
> fasster alternative to hashes, i could improve the overall performance
> of my program. Gimme some suggestions.

Indexing them by symbols instead of strings helps a bit.
But hashes are so fast that it's hard to even make a meaningful
benchmark. For example in the following meaningless benchmark
symbol-hashes are faster than string-hashes:

$ time ./meaningless_hash_benchmark.rb --symbols
real 0m16.117s
user 0m14.009s
sys 0m0.485s
$ time ./meaningless_hash_benchmark.rb
real 0m18.525s
user 0m16.306s
sys 0m0.475s
$ cat ./meaningless_hash_benchmark.rb
#!/usr/bin/ruby

N = 2000000

# Lame
if ARGV.empty?
keys = File.readlines("H").map{|x| x.chomp}
else
keys = File.readlines("H").map{|x| x.chomp.to_sym}
end

h = {}
keys.each{|k| h[k] = rand }

k0 = keys[0]
k1 = keys[1]
k2 = keys[2]

# A meaningless loop
N.times{|i|
h[k0] += h[k1]
h[k1] += h[k2]
h[k2] += h[k0]
}
$

15% is pretty impressive for a benchmark dominated by method calls.

That's related to how hashes work. To get a_hash[a_key]
* a_key is converted to a number (hashed)
* the number is looked up in a_hash's internal array
** we can get 0 answer - then a_key is definitely not there
** we can get 1 answer - then verify it by a_real_key == a_key
** we can get 2+ answers - then verify them all until match is found
(this case is rare, and high numbers here are even more rare)
* So if we found a_real_key, such that a_real_key == a_key, then
return a_value_at_a_real_key
* == can actually cost a lot. If you use Strings, their contents need
to be compared. If you use Symbols, you only check whether they're the
same object. Each Symbol object is different, so this is enough.

--
Tomasz Wegrzanowski [ http://t-a-w.blo... ]

Ara.T.Howard

10/18/2006 3:09:00 PM

J2M

10/18/2006 3:42:00 PM

> bad idea for big hashes - symbols are never gc'd and swapping is never fast!

gc'd ?

James

Tomasz Wegrzanowski

10/18/2006 8:44:00 PM

On 10/18/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:
> On Wed, 18 Oct 2006, Tomasz Wegrzanowski wrote:
> > On 10/18/06, unni.tallman <unni.tallman@gmail.com> wrote:
> > Indexing them by symbols instead of strings helps a bit.
> > But hashes are so fast that it's hard to even make a meaningful
> > benchmark. For example in the following meaningless benchmark
> > symbol-hashes are faster than string-hashes:
>
> bad idea for big hashes - symbols are never gc'd and swapping is never fast!

It would have to be totally extreme for that to become a problem,
like with millions different keys in your program. Then you're
absolutely correct, it can seriously hurt.

But in most normal cases, using symbols is a win.

And GC'ing them would require major changes to internal representation,
as they are direct values now, and as such not tracked at all.
Maybe in Ruby 3. :-)

--
Tomasz Wegrzanowski [ http://t-a-w.blo... ]

James Gray

10/18/2006 8:50:00 PM

On Oct 18, 2006, at 3:44 PM, Tomasz Wegrzanowski wrote:

> Maybe in Ruby 3. :-)

You're behind on the news. ;)

Symbols just became a String subclass in Ruby 1.9.

James Edward Gray II

Ara.T.Howard

10/18/2006 10:02:00 PM

Tomasz Wegrzanowski

10/18/2006 10:47:00 PM

On 10/18/06, James Edward Gray II <james@grayproductions.net> wrote:
> On Oct 18, 2006, at 3:44 PM, Tomasz Wegrzanowski wrote:
>
> > Maybe in Ruby 3. :-)
>
> You're behind on the news. ;)
>
> Symbols just became a String subclass in Ruby 1.9.

Darn, and I've been using such an ancient snapshot of 1.9. :-)

Still, garbage collecting Symbols probably won't be doable before Ruby 3.
All over the source IDs (integers basically) are used as symbols, not
Symbol objects.

You can see it even within Ruby:

foo = :some_weird_symbol.to_i #=> 10897
# :some_weird_symbol can be garbage collected now, right ?
# Or not quite ...
foo.to_sym #=> :some_weird_symbol

These numbers are used as method names and for all other internal
purposes. Symbols only pretend to be nice instances of Symbol class,
when nobody's looking, they're just numbers :-)

--
Tomasz Wegrzanowski [ http://t-a-w.blo... ]

Jeremy Henty

10/18/2006 10:48:00 PM

On 2006-10-18, James Edward Gray II <james@grayproductions.net> wrote:

> You're behind on the news. ;)
>
> Symbols just became a String subclass in Ruby 1.9.

That doesn't preclude them being immediate values; an object's class
is independent of its internal TYPE. Check out "eigenclass - Changes
in Ruby 1.9" at
<URL:http://eigenclass.org/hiki.rb?Changes+in+Ru... and see the
discussion under the heading "Hash#_compare_by_identity and
Hash#compare_by_identity?". This states that :c and :c are still
always the same object, whereas "c" and "c" are still in general
different. I think this shows that Symbols remain immediate. Unless
I've misunderstood something, of course.

Regards,

Jeremy Henty

Paul Lutus

10/19/2006 7:19:00 AM

unni.tallman wrote:

> Anyway to enhance the performance of ruby hashes? My program uses
> hashes heavily.

Many comments have been offered, but since no one knows what your code looks
like, I want to offer the standard advice that assumes you are actually
having a hash speed problem. The advice is:

Make your hashes smaller by creating two or more tiers of hashes. One way to
do this with strings as keys is to create one hash using the first (one,
two, whatever) characters as the key for level one, and the remaining
characters as the key into the second level. You should be able to figure
out how to create a general method that implements this advice.

The basic idea is that hash speed degrades as the size of each single-level
hash increases, and after a certain size has been reached, breaking the
hash into groups (layers if you prefer) improves performance.

> So if i could enhance hash performance, or find a
> fasster alternative to hashes, i could improve the overall performance
> of my program. Gimme some suggestions.

Give us the details of the problem -- the nature of the keys, the size of
the data set, and so forth.

--
Paul Lutus
http://www.ara...

comp.lang.ruby

Performance of ruby hashes

unni.tallman

Robert Klemme

Tomasz Wegrzanowski

Ara.T.Howard

J2M

Tomasz Wegrzanowski

James Gray

Ara.T.Howard

Tomasz Wegrzanowski

Jeremy Henty

Paul Lutus

x Login to ForumsZone