Robert Klemme
10/31/2008 5:19:00 PM
On 31.10.2008 11:21, Sarcar, Shourya C (GE Healthcare) wrote:
>> -----Original Message-----
>> From: Robert Klemme [mailto:shortcutter@googlemail.com]
>> Sent: Friday, October 31, 2008 12:04 AM
>> To: ruby-talk ML
>> Subject: Re: array comparison
>>
>> The comparison is a bit unfair since you include the build
>> time for the structure in case of Hash but not for the Array.
>> You should at least also compare just the intersection time.
>> And while you're at it you can also add Set to the mix. :-)
>
>
> Interesting discussion.
>
> I wrote a small program to compare Array intersection to Set
> intersection.
> I noticed that for smaller sets (~10 ** 3), Array is much faster that
> Set, sometimes upto 5x.
> As the set size grows, they seem to converge (~10 ** 7, 100 million)
>
> # Program to compare timings for intersections
> require 'set'
>
> n = 10000
> while (n <= 10 ** 7) do
> c = Array.new(n) {rand n}.uniq
> d = Array.new(n) {rand n}.uniq
>
> # Native array intersection
> t0 = Time.now
> cd = c & d
> t1 = Time.now
> printf("Array | n = %d : %d intersects found in %f
> seconds\n",n,cd.size,t1-t0)
>
> # Convert array to Set objects and perform intersection
> c = c.to_set
> d = d.to_set
> t2 = Time.now
> cd = c & d
> t3 = Time.now
> printf("Set | n = %d : %d intersects found in %f
> seconds\n",n,cd.size,t3-t2)
> printf("Ratio of Set/Array intersect time =
> %.2f\n",(t3-t2)/(t1-t0))
>
> n = 2 * n
> end
Some things are noteworthy about your program:
- It has a large intersection, i.e. roughly 50% on average. This may
be different for other scenarios.
- You use Fixnums which compare pretty fast, you can expect to reach
break even earlier when using other types.
I have taken the liberty to extend your test program a bit changing
those two parameters mentioned above and you can see significant
differences. :-)
Kind regards
robert
require 'set'
constr = [
lambda do |n|
[
Array.new(n) {rand n}.uniq,
Array.new(n) {rand(n) + (n*0.9).to_i}.uniq
]
end,
lambda do |n|
[
Array.new(n) {"#{rand n}" << ("*" * 100_000)}.uniq,
Array.new(n) {"#{rand(n) + (n*0.9).to_i}" << ("*" * 100_000)}.uniq
]
end
]
2.times do |run|
puts "=== run #{run + 1} ========================================================"
constr.each_with_index do |creat, idx|
puts "--- construct #{idx} --------------------------------------------------------"
n = 1
loop do
c, d = creat[n]
# Native array intersection
t0 = Time.now
cd = c & d
t1 = Time.now
printf("Array | n = %d : %d intersects found in %f seconds\n",n,cd.size,t1-t0)
# Convert array to Set objects and perform intersection
c = c.to_set
d = d.to_set
t2 = Time.now
cd = c & d
t3 = Time.now
printf("Set | n = %d : %d intersects found in %f seconds\n",n,cd.size,t3-t2)
ratio = (t3-t2)/(t1-t0)
printf("Ratio of Set/Array intersect time = %.2f\n",ratio)
n *= 2
break if n > 100 && ratio > 0.1 && ratio < 0.9
end
end
end
=== run 1 ========================================================
--- construct 0 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.000000 seconds
Set | n = 1 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 2 : 0 intersects found in 0.000000 seconds
Set | n = 2 : 0 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 4 : 1 intersects found in 0.000000 seconds
Set | n = 4 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 8 : 0 intersects found in 0.000000 seconds
Set | n = 8 : 0 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 16 : 0 intersects found in 0.000000 seconds
Set | n = 16 : 0 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 32 : 1 intersects found in 0.000000 seconds
Set | n = 32 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 64 : 2 intersects found in 0.000000 seconds
Set | n = 64 : 2 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 128 : 3 intersects found in 0.000000 seconds
Set | n = 128 : 3 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 256 : 13 intersects found in 0.000000 seconds
Set | n = 256 : 13 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 512 : 19 intersects found in 0.000000 seconds
Set | n = 512 : 19 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 1024 : 38 intersects found in 0.001000 seconds
Set | n = 1024 : 38 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = 1.00
Array | n = 2048 : 87 intersects found in 0.001000 seconds
Set | n = 2048 : 87 intersects found in 0.003000 seconds
Ratio of Set/Array intersect time = 3.00
Array | n = 4096 : 147 intersects found in 0.002000 seconds
Set | n = 4096 : 147 intersects found in 0.005000 seconds
Ratio of Set/Array intersect time = 2.50
Array | n = 8192 : 344 intersects found in 0.004000 seconds
Set | n = 8192 : 344 intersects found in 0.010000 seconds
Ratio of Set/Array intersect time = 2.50
Array | n = 16384 : 674 intersects found in 0.008000 seconds
Set | n = 16384 : 674 intersects found in 0.021000 seconds
Ratio of Set/Array intersect time = 2.62
Array | n = 32768 : 1353 intersects found in 0.015000 seconds
Set | n = 32768 : 1353 intersects found in 0.046000 seconds
Ratio of Set/Array intersect time = 3.07
Array | n = 65536 : 2605 intersects found in 0.040000 seconds
Set | n = 65536 : 2605 intersects found in 0.096000 seconds
Ratio of Set/Array intersect time = 2.40
Array | n = 131072 : 5278 intersects found in 0.125000 seconds
Set | n = 131072 : 5278 intersects found in 0.203000 seconds
Ratio of Set/Array intersect time = 1.62
Array | n = 262144 : 10610 intersects found in 0.327000 seconds
Set | n = 262144 : 10610 intersects found in 0.416000 seconds
Ratio of Set/Array intersect time = 1.27
Array | n = 524288 : 21003 intersects found in 0.869000 seconds
Set | n = 524288 : 21003 intersects found in 0.827000 seconds
Ratio of Set/Array intersect time = 0.95
Array | n = 1048576 : 41912 intersects found in 2.059000 seconds
Set | n = 1048576 : 41912 intersects found in 1.674000 seconds
Ratio of Set/Array intersect time = 0.81
--- construct 1 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.002000 seconds
Set | n = 1 : 1 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = 0.50
Array | n = 2 : 1 intersects found in 0.003000 seconds
Set | n = 2 : 1 intersects found in 0.003000 seconds
Ratio of Set/Array intersect time = 1.00
Array | n = 4 : 0 intersects found in 0.003000 seconds
Set | n = 4 : 0 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = 0.33
Array | n = 8 : 0 intersects found in 0.011000 seconds
Set | n = 8 : 0 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.18
Array | n = 16 : 2 intersects found in 0.013000 seconds
Set | n = 16 : 2 intersects found in 0.008000 seconds
Ratio of Set/Array intersect time = 0.62
Array | n = 32 : 1 intersects found in 0.026000 seconds
Set | n = 32 : 1 intersects found in 0.009000 seconds
Ratio of Set/Array intersect time = 0.35
Array | n = 64 : 3 intersects found in 0.054000 seconds
Set | n = 64 : 3 intersects found in 0.020000 seconds
Ratio of Set/Array intersect time = 0.37
=== run 2 ========================================================
--- construct 0 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.000000 seconds
Set | n = 1 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 2 : 1 intersects found in 0.000000 seconds
Set | n = 2 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 4 : 1 intersects found in 0.000000 seconds
Set | n = 4 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 8 : 1 intersects found in 0.000000 seconds
Set | n = 8 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 16 : 1 intersects found in 0.000000 seconds
Set | n = 16 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 32 : 1 intersects found in 0.000000 seconds
Set | n = 32 : 1 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 64 : 5 intersects found in 0.000000 seconds
Set | n = 64 : 5 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 128 : 7 intersects found in 0.000000 seconds
Set | n = 128 : 7 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 256 : 9 intersects found in 0.000000 seconds
Set | n = 256 : 9 intersects found in 0.000000 seconds
Ratio of Set/Array intersect time = nan
Array | n = 512 : 24 intersects found in 0.000000 seconds
Set | n = 512 : 24 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 1024 : 47 intersects found in 0.000000 seconds
Set | n = 1024 : 47 intersects found in 0.001000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 2048 : 79 intersects found in 0.000000 seconds
Set | n = 2048 : 79 intersects found in 0.003000 seconds
Ratio of Set/Array intersect time = inf
Array | n = 4096 : 178 intersects found in 0.002000 seconds
Set | n = 4096 : 178 intersects found in 0.004000 seconds
Ratio of Set/Array intersect time = 2.00
Array | n = 8192 : 322 intersects found in 0.004000 seconds
Set | n = 8192 : 322 intersects found in 0.010000 seconds
Ratio of Set/Array intersect time = 2.50
Array | n = 16384 : 650 intersects found in 0.008000 seconds
Set | n = 16384 : 650 intersects found in 0.022000 seconds
Ratio of Set/Array intersect time = 2.75
Array | n = 32768 : 1326 intersects found in 0.015000 seconds
Set | n = 32768 : 1326 intersects found in 0.046000 seconds
Ratio of Set/Array intersect time = 3.07
Array | n = 65536 : 2626 intersects found in 0.041000 seconds
Set | n = 65536 : 2626 intersects found in 0.095000 seconds
Ratio of Set/Array intersect time = 2.32
Array | n = 131072 : 5279 intersects found in 0.129000 seconds
Set | n = 131072 : 5279 intersects found in 0.205000 seconds
Ratio of Set/Array intersect time = 1.59
Array | n = 262144 : 10491 intersects found in 0.334000 seconds
Set | n = 262144 : 10491 intersects found in 0.403000 seconds
Ratio of Set/Array intersect time = 1.21
Array | n = 524288 : 21031 intersects found in 1.112000 seconds
Set | n = 524288 : 21031 intersects found in 0.814000 seconds
Ratio of Set/Array intersect time = 0.73
--- construct 1 --------------------------------------------------------
Array | n = 1 : 1 intersects found in 0.001000 seconds
Set | n = 1 : 1 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 2.00
Array | n = 2 : 1 intersects found in 0.003000 seconds
Set | n = 2 : 1 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.67
Array | n = 4 : 1 intersects found in 0.004000 seconds
Set | n = 4 : 1 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.50
Array | n = 8 : 0 intersects found in 0.007000 seconds
Set | n = 8 : 0 intersects found in 0.002000 seconds
Ratio of Set/Array intersect time = 0.29
Array | n = 16 : 0 intersects found in 0.013000 seconds
Set | n = 16 : 0 intersects found in 0.005000 seconds
Ratio of Set/Array intersect time = 0.38
Array | n = 32 : 1 intersects found in 0.024000 seconds
Set | n = 32 : 1 intersects found in 0.009000 seconds
Ratio of Set/Array intersect time = 0.37
Array | n = 64 : 3 intersects found in 0.055000 seconds
Set | n = 64 : 3 intersects found in 0.020000 seconds
Ratio of Set/Array intersect time = 0.36