[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Array and hash iteration questions

Ben Giddings

9/30/2003 7:10:00 PM

I have a CSV file and I'm trying to do a few things with it. Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in conjunction
with the first one.

I'm iterating over the lines of the file, and splitting them into an array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.

In order to count the number of times something is seen, I took the approach:

cases = Hash.new(0)
...
cases[arr[324]] += 1
...

But now I want to save the number of cases where another value occurs with
the first one. (Essentially errors indexed by case)

The approach I have now is:

cases = Hash.new(0)
errors = Hash.new(0)
...
case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end
...

That works, but it seems to me that I really should be doing this with one
hash, not two. Any suggestions?

Next, I want to print out the values. It is easy to do this with
cases.each, but I'd like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).

Any ideas would be appreciated.

Ben


5 Answers

Robert Klemme

10/1/2003 6:50:00 AM

0


"Ben Giddings" <bg-rubytalk@infofiend.com> schrieb im Newsbeitrag
news:3F79D516.9050509@infofiend.com...
> I have a CSV file and I''m trying to do a few things with it.
Essentially
> what it boils down to is: count the number of times a certain value is
> seen, then count the number of times another value is seen in
conjunction
> with the first one.
>
> I''m iterating over the lines of the file, and splitting them into an
array
> with arr = line.split(/,/). That part works well, but there are a few
> questions about how to do something efficiently.
>
> In order to count the number of times something is seen, I took the
approach:
>
> cases = Hash.new(0)
> ..
> cases[arr[324]] += 1
> ..
>
> But now I want to save the number of cases where another value occurs
with
> the first one. (Essentially errors indexed by case)
>
> The approach I have now is:
>
> cases = Hash.new(0)
> errors = Hash.new(0)
> ..
> case = arr[324]
> cases[case] += 1
> if arr[532] =~ /Error/
> errors[case] += 1
> end
> ..
>
> That works, but it seems to me that I really should be doing this with
one
> hash, not two. Any suggestions?

cases = Hash.new {|h,k| h[k] = [0, 0]}
...
ca = arr[324]
counter = cases[ca]
counter[0] += 1

counter[1] += 1 if /Error/ =~ arr[532]

> Next, I want to print out the values. It is easy to do this with
> cases.each, but I''d like to print them out, sorted by case. The best
> solution I have so far uses cases.keys.sort.each, then inside the block
> uses cases[key] (and errors[key]).

cases.sort.each do |ca, counter|
printf "%10s: %4d", ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print "\n"
end

Regards

robert

Ben Giddings

10/1/2003 5:15:00 PM

0

Robert Klemme wrote:
> cases = Hash.new {|h,k| h[k] = [0, 0]}

Ah. I couldn't remember how to use the block form properly. I'm actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case]['Number'] += 1
cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

> cases.sort.each do |ca, counter|
> printf "%10s: %4d", ca, counter[0]
> printf " %4d", counter[1] if counter[1] > 0
> print "\n"
> end

Aha, I just assumed hash didn't have a sort method, because the concept of
a "sorted hash" seemed meaningless, but since it actually returns an array
containing [key, value] pairs, that's perfect!

Thanks Robert

Ben


Robert Klemme

10/2/2003 8:23:00 AM

0


"Ben Giddings" <bg-rubytalk@infofiend.com> schrieb im Newsbeitrag
news:3F7B0BAC.7030305@infofiend.com...
> Robert Klemme wrote:
> > cases = Hash.new {|h,k| h[k] = [0, 0]}
>
> Ah. I couldn't remember how to use the block form properly. I'm
actually
> going to use:
>
> cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}
>
> Because it will make some of the later stuff more clear like
>
> cases[case]['Number'] += 1
> cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

No need to use a Hash for this...

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.
It *can* affect performance if you have a large amount of cases... (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

counters = cases[case]
counters[Number] += 1
counters[Errors] += 1 if arr[OFFSET] =~ /Error/

You could as well do

cases[case].instance_eval do
self[Number] += 1
self[Errors] += 1 if arr[OFFSET] =~ /Error/
end

I'm getting carried away... :-)

> > cases.sort.each do |ca, counter|
> > printf "%10s: %4d", ca, counter[0]
> > printf " %4d", counter[1] if counter[1] > 0
> > print "\n"
> > end
>
> Aha, I just assumed hash didn't have a sort method, because the concept
of
> a "sorted hash" seemed meaningless, but since it actually returns an
array
> containing [key, value] pairs, that's perfect!

It is! Thanks to Matz's wisdom.

> Thanks Robert

You're welcome.

Kind regards

robert


10:17:02 [ruby]: ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
11.28 22.28 2.51 100001 0.03 0.03 Array#[]
0.07 22.30 0.01 1 15.00 15.00
Profiler__.start_profile
0.00 22.30 0.00 2 0.00 11140.50 Object#test
0.00 22.30 0.00 3 0.00 0.00 Module#method_added
0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
0.00 22.30 0.00 1 0.00 22281.00 #toplevel
0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
10:17:25 [ruby]: cat lookups.rb


def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end

testHash
testArray

10:18:15 [ruby]:

Robert Klemme

10/2/2003 9:43:00 AM

0


"Robert Klemme" <bob.news@gmx.net> schrieb im Newsbeitrag
news:blgp2a$bvnb8$1@ID-52924.news.uni-berlin.de...
>
> "Ben Giddings" <bg-rubytalk@infofiend.com> schrieb im Newsbeitrag
> news:3F7B0BAC.7030305@infofiend.com...
> > Robert Klemme wrote:
> > > cases = Hash.new {|h,k| h[k] = [0, 0]}
> >
> > Ah. I couldn't remember how to use the block form properly. I'm
> actually
> > going to use:
> >
> > cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}
> >
> > Because it will make some of the later stuff more clear like
> >
> > cases[case]['Number'] += 1
> > cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/
>
> No need to use a Hash for this...
>
> Number = 0
> Errors = 1
>
> cases[case][Number] += 1
> cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/
>
> I might be a bit pricky, but storing the array ref saves one hash
lookup.

> It *can* affect performance if you have a large amount of cases... (see
> below; although the timing is dominated by the iteration here, you can
see
> that the array is faster)

This sentence should really have appeared several lines above: it's the
argument in favour of using arrays instead of hashes for the counters.

Regards

robert

aero6dof

10/2/2003 5:03:00 PM

0

"Robert Klemme" <bob.news@gmx.net> wrote in message news:<blgp2a$bvnb8$1@ID-52924.news.uni-berlin.de>...
> No need to use a Hash for this...
>
> Number = 0
> Errors = 1
>
> cases[case][Number] += 1
> cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/
>
> I might be a bit pricky, but storing the array ref saves one hash lookup.
> It *can* affect performance if you have a large amount of cases... (see
> below; although the timing is dominated by the iteration here, you can see
> that the array is faster)

I'm not sure if my testing method is quite consistent, but making a specific
record object looks like it could speed things up even more...

>ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
73.74 13.08 13.08 3 4359.00 5911.67 Integer#upto
14.47 15.64 2.57 100001 0.03 0.03 Hash#[]
11.79 17.73 2.09 100001 0.02 0.02 Array#[]
0.08 17.75 0.01 1 15.00 15.00 Profiler__.start_profile
0.00 17.75 0.00 1 0.00 17735.00 #toplevel
0.00 17.75 0.00 1 0.00 0.00 Class#inherited
0.00 17.75 0.00 1 0.00 1329.00 Object#testObj
0.00 17.75 0.00 2 0.00 8203.00 Object#test
0.00 17.75 0.00 1 0.00 0.00 TestObj#initialize
0.00 17.75 0.00 1 0.00 8203.00 Object#testArray
0.00 17.75 0.00 9 0.00 0.00 Module#method_added
0.00 17.75 0.00 1 0.00 8203.00 Object#testHash
0.00 17.75 0.00 1 0.00 0.00 Module#attr_accessor
0.00 17.75 0.00 1 0.00 0.00 Class#new
>type lookups.rb
def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end


# a simple record class...
class TestObj
attr_accessor :num, :err
def initialize
@num = 0
@err = 0
end
end

def testObj
to = TestObj.new
0.upto( 100000 ) do
to.err
end
end

testHash
testArray
testObj

> 10:17:02 [ruby]: ruby -rprofile lookups.rb
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
> 26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
> 11.28 22.28 2.51 100001 0.03 0.03 Array#[]
> 0.07 22.30 0.01 1 15.00 15.00
> Profiler__.start_profile
> 0.00 22.30 0.00 2 0.00 11140.50 Object#test
> 0.00 22.30 0.00 3 0.00 0.00 Module#method_added
> 0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
> 0.00 22.30 0.00 1 0.00 22281.00 #toplevel
> 0.00 22.30 0.00 1 0.00 11110.00 Object#testHash