Robert Klemme
2/7/2007 9:18:00 PM
On 07.02.2007 21:53, Charles L. Snyder wrote:
> I have several text files that look like this:
>
> Brazil, 10
> Brazil, 13
> Brazil, 9
> Bulgaria, 1
> Canada, 48
> Canada, 52
> Canada, 38
> Canada, 55
> Canada, 59
> Chile, 1
> Chile, 1
> Chile, 2
> China, 7
> China, 18
> China, 19
> China, 22
> China, 25
>
> I need to iterate through the above file(s) and get the data
> summarized in the form:
>
> Canada, 252
> China, 91
> Chile, 4
> Brazil, 32
> Bulgaria, 1
I would do that in stream mode, i.e. not first read all and then
summarize but directly summarize (see attached). Reason is, that this
is more efficient especially since these files look like they could be
large.
> I know how to go from a single column list with multiple repeated
> values to a 'histogram' type list, ie:
>
> my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
> += 1; counts}
I don't know why you do this. Do you also need the number of occurrences?
> my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }
>
> but I'm unable to figure out how to get the 2-column csv values into a
> total by country as shown above.
> (I do have another file "countries.txt" which is a unique list of
> countries.)
You don't need the second file unless you want to report zero counts for
countries not present.
Kind regards
robert
counts = Hash.new 0
DATA.each do |line|
line.chomp!
country, val = line.split /,\s*/
counts[country] += val.to_i if country && val
end
counts.sort_by {|cn,co| -co}.each do |country, count|
print country, " ", count, "\n"
end
__END__
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25