Jesús Gabriel y Galán
2/2/2009 11:22:00 AM
On Mon, Feb 2, 2009 at 12:00 PM, Patrick Put <patrick.put@gmail.com> wrote:
> I've been searching for this information already, but cannot really find
> the answer I'm looking for.
>
> I have to find values in a list of some 40,000 strings. Putting all
> these in an array is not likely to be the quickest way.... and it isn't.
> A set seems to be faster already. Maybe using a map where I only use the
> keys is also faster. But I'm wondering: is there no such thing as a tree
> to store and retrieve items in a fast way?
>
> If I remember well, for n elements in a list it should take log(n) steps
> then, where it takes n/2 just going through an array if I only want to
> pick up the element. In my situation it is much closer to n because most
> of the times the value is NOT in the list.
>
> Any ideas?
If you are only concerned with lookup time, a hash has amortized O(1)
lookup time:
require 'benchmark'
words = ("aaaa".."zzzz").to_a # => 456,976 strings
hash = {}
words.each {|w| hash[w] = true}
TIMES = 10
Benchmark.bmbm do |x|
x.report("Array#find existing") do
TIMES.times do
words.find {|w| w == "mmmm"}
end
end
x.report("Array#find non-existing") do
TIMES.times do
words.find {|w| w == "mmmmm"}
end
end
x.report("Hash#[] existing") do
TIMES.times do
hash["mmmm"]
end
end
x.report("Hash#[] non-existing") do
TIMES.times do
hash["mmmmm"]
end
end
end
jesus@jesus-laptop:~/personal/programacion/ruby/Benchmarks/lib$ ruby
hash_array2.rb
Rehearsal -----------------------------------------------------------
Array#find existing 3.160000 0.700000 3.860000 ( 3.870061)
Array#find non-existing 6.280000 1.630000 7.910000 ( 7.935304)
Hash#[] existing 0.000000 0.000000 0.000000 ( 0.000019)
Hash#[] non-existing 0.000000 0.000000 0.000000 ( 0.000018)
------------------------------------------------- total: 11.770000sec
user system total real
Array#find existing 3.130000 0.710000 3.840000 ( 3.866618)
Array#find non-existing 5.820000 1.580000 7.400000 ( 9.195417)
Hash#[] existing 0.000000 0.000000 0.000000 ( 0.000030)
Hash#[] non-existing 0.000000 0.000000 0.000000 ( 0.000032)
Jesus.