[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Premature Optimization

poopdeville

10/25/2006 8:39:00 PM

Hi everybody,

I'm writing a fairly open-ended question. I'm hoping for suggestions,
opinions, advice. Suppose I have n arrays, each of which has m
entries. m is a fairly large integer, on the order of 10,000. Each
entry is either 1 or 0.

The first task I need to accomplish is figuring out how many times a 1
occurs in the ith entry in an array. So for concreteness, if I had
arrays:

first = [1,0,0,0,0]
second = [1,1,0,0,0]
third = [0,0,0,1,0]

I would end up with
count = [2,1,0,1,0]

I'm just trying to give the general flavor of what I'm working on. I
know I can use some simple each_with_index loops to increment
count[index] (something along the lines of:)

count = Array.new(m,0)
[first, second, third].each do |array|
array.each_with_index do |item, index|
count[index] += item
end
end

There are going to be m * 3n * (two constant multipliers for the
looping) object allocations and method calls. m is fairly large, and I
have other, similar, tasks to accomplish with this data. The faster I
can process this data, the more data I can process in a given amount of
time, and the more accurate the analysis will be.

Is there a slick way to do this with unpacking and packing? Or some
other way to do this with strings? Any modules or libraries I should
look into? I'm fairly new to Ruby, though not to scientific
computation. I realize I won't be able to do this any faster than with
m * n * (a constant multiplier), but I need that constant multiplier to
be as small as possible. Any advice would be appreciated.

Thanks!
'cid 'ooh

7 Answers

Robert Klemme

10/25/2006 8:48:00 PM

0

poopdeville@gmail.com wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m
> entries. m is a fairly large integer, on the order of 10,000. Each
> entry is either 1 or 0.

> Is there a slick way to do this with unpacking and packing? Or some
> other way to do this with strings? Any modules or libraries I should
> look into? I'm fairly new to Ruby, though not to scientific
> computation. I realize I won't be able to do this any faster than with
> m * n * (a constant multiplier), but I need that constant multiplier to
> be as small as possible. Any advice would be appreciated.

I would look out for an implementation of a BitSet as this data
structure seems crucial for your application. You could even create one
your own - it's not too hard.

Kind regards

robert

Ara.T.Howard

10/25/2006 9:32:00 PM

0

Wilson Bilkovich

10/25/2006 9:51:00 PM

0

On 10/25/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:
> On Thu, 26 Oct 2006 poopdeville@gmail.com wrote:
>
> > Hi everybody,
> >
> > I'm writing a fairly open-ended question. I'm hoping for suggestions,
> > opinions, advice. Suppose I have n arrays, each of which has m entries. m
> > is a fairly large integer, on the order of 10,000. Each entry is either 1
> > or 0.
> >
> > The first task I need to accomplish is figuring out how many times a 1
> > occurs in the ith entry in an array. So for concreteness, if I had arrays:
> >
> > first = [1,0,0,0,0]
> > second = [1,1,0,0,0]
> > third = [0,0,0,1,0]
> >
> > I would end up with
> > count = [2,1,0,1,0]
>
> harp:~ > cat a.rb
> require 'narray'
>
> first = NArray.to_na [1,0,0,0,0]
> second = NArray.to_na [1,1,0,0,0]
> third = NArray.to_na [0,0,0,1,0]
>
> count = first.eq(1) + second.eq(1) + third.eq(1)
>
> p count
>
>
> harp:~ > ruby a.rb
> NArray.byte(5):
> [ 2, 1, 0, 1, 0 ]
>
>
> > I'm just trying to give the general flavor of what I'm working on. I know I
> > can use some simple each_with_index loops to increment count[index]
> > (something along the lines of:)
> >
> > count = Array.new(m,0)
> > [first, second, third].each do |array|
> > array.each_with_index do |item, index|
> > count[index] += item
> > end
> > end
> >
> > There are going to be m * 3n * (two constant multipliers for the
> > looping) object allocations and method calls. m is fairly large, and I
> > have other, similar, tasks to accomplish with this data. The faster I
> > can process this data, the more data I can process in a given amount of
> > time, and the more accurate the analysis will be.
>
> i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.
>
>

What's it going to take to get this into the stdlib? It is awesome.

Ara.T.Howard

10/26/2006 1:53:00 AM

0

dga

10/26/2006 6:35:00 AM

0

Someone already mentioned the NArray way; if you treat the entire
thing as a set of matrix operations, you can make it insanely fast in
either NArray or GSL. But don't treat it as "N" arrays with "M"
entries -- treat it as a matrix, and then all of your operations can be
done in the external (very fast) code:

m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])

You can then either cheat, and "flatten" the matrix into a vector and
find the sum:

puts m.to_v.sum

(which works since they're ones)

Or pretend you're in Matlab and do it as a set of matrix operations:

colones = GSL::Matrix.new(1, m.size[1])
colones[0].set_all(1)

rowones = GSL::Matrix.new(1, m.size[0])
rowones[0].set_all(1)

puts ((colones * m.transpose) * rowones.transpose)[0,0]

-Dave


poopdeville@gmail.com wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m
> entries. m is a fairly large integer, on the order of 10,000. Each
> entry is either 1 or 0.
>
> The first task I need to accomplish is figuring out how many times a 1
> occurs in the ith entry in an array. So for concreteness, if I had
> arrays:
>
> first = [1,0,0,0,0]
> second = [1,1,0,0,0]
> third = [0,0,0,1,0]
>
> I would end up with
> count = [2,1,0,1,0]
>
> I'm just trying to give the general flavor of what I'm working on. I
> know I can use some simple each_with_index loops to increment
> count[index] (something along the lines of:)
>
> count = Array.new(m,0)
> [first, second, third].each do |array|
> array.each_with_index do |item, index|
> count[index] += item
> end
> end
>
> There are going to be m * 3n * (two constant multipliers for the
> looping) object allocations and method calls. m is fairly large, and I
> have other, similar, tasks to accomplish with this data. The faster I
> can process this data, the more data I can process in a given amount of
> time, and the more accurate the analysis will be.
>
> Is there a slick way to do this with unpacking and packing? Or some
> other way to do this with strings? Any modules or libraries I should
> look into? I'm fairly new to Ruby, though not to scientific
> computation. I realize I won't be able to do this any faster than with
> m * n * (a constant multiplier), but I need that constant multiplier to
> be as small as possible. Any advice would be appreciated.
>
> Thanks!
> 'cid 'ooh

dga

10/26/2006 6:58:00 AM

0

Tweaking a bit and benchmarking with 100 rows of 10,000 elements each

user system total real
Create Matrix 3.940000 0.090000 4.030000 ( 4.037665)
Add Up Matrix 0.710000 0.000000 0.710000 ( 0.711897)
Create GSL::Matrix 3.400000 0.050000 3.450000 ( 3.496827)
Add GSL::Matrix 0.080000 0.030000 0.110000 ( 0.111982)
Matrix Add GSL::Matrix 0.020000 0.000000 0.020000 ( 0.017661)

With the "Matrix Add" done as:

(@m * colones.transpose).to_v.sum

(to avoid @m.transpose, which creates a copy of all of @m).

Have fun. Note that there are faster ways to initialize a GSL array
from an external source, if you end up concerned about that source of
overhead. (Or an NArray).

-Dave

dga wrote:
> Someone already mentioned the NArray way; if you treat the entire
> thing as a set of matrix operations, you can make it insanely fast in
> either NArray or GSL. But don't treat it as "N" arrays with "M"
> entries -- treat it as a matrix, and then all of your operations can be
> done in the external (very fast) code:
>
> m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])
>
> You can then either cheat, and "flatten" the matrix into a vector and
> find the sum:
>
> puts m.to_v.sum
>
> (which works since they're ones)
>
> Or pretend you're in Matlab and do it as a set of matrix operations:
>
> colones = GSL::Matrix.new(1, m.size[1])
> colones[0].set_all(1)
>
> rowones = GSL::Matrix.new(1, m.size[0])
> rowones[0].set_all(1)
>
> puts ((colones * m.transpose) * rowones.transpose)[0,0]
>
> -Dave
>
>
> poopdeville@gmail.com wrote:
> > Hi everybody,
> >
> > I'm writing a fairly open-ended question. I'm hoping for suggestions,
> > opinions, advice. Suppose I have n arrays, each of which has m
> > entries. m is a fairly large integer, on the order of 10,000. Each
> > entry is either 1 or 0.
> >
> > The first task I need to accomplish is figuring out how many times a 1
> > occurs in the ith entry in an array. So for concreteness, if I had
> > arrays:
> >
> > first = [1,0,0,0,0]
> > second = [1,1,0,0,0]
> > third = [0,0,0,1,0]
> >
> > I would end up with
> > count = [2,1,0,1,0]
> >
> > I'm just trying to give the general flavor of what I'm working on. I
> > know I can use some simple each_with_index loops to increment
> > count[index] (something along the lines of:)
> >
> > count = Array.new(m,0)
> > [first, second, third].each do |array|
> > array.each_with_index do |item, index|
> > count[index] += item
> > end
> > end
> >
> > There are going to be m * 3n * (two constant multipliers for the
> > looping) object allocations and method calls. m is fairly large, and I
> > have other, similar, tasks to accomplish with this data. The faster I
> > can process this data, the more data I can process in a given amount of
> > time, and the more accurate the analysis will be.
> >
> > Is there a slick way to do this with unpacking and packing? Or some
> > other way to do this with strings? Any modules or libraries I should
> > look into? I'm fairly new to Ruby, though not to scientific
> > computation. I realize I won't be able to do this any faster than with
> > m * n * (a constant multiplier), but I need that constant multiplier to
> > be as small as possible. Any advice would be appreciated.
> >
> > Thanks!
> > 'cid 'ooh

poopdeville

10/30/2006 7:53:00 PM

0


poopdeville@gmail.com wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice.
<snip>

Thanks for all your replies. I've worked with the Gnu Scientific
Library (and ATLAS, too) before, but I didn't realize there were Ruby
binding for it. I settled on a Ruby/GSL with a custom compiled ATLAS
backend (instead of just BLAS like GSL uses normally). It's screaming
fast.

'cid 'ooh