Asp Forum - BIG memory problem

tobyclemson@gmail.com

8/8/2008 11:40:00 AM

Hi all,

I'm having a really odd memory problem with a small ruby program I've
written. It basically takes in lines from input files (which represent
router flows), deduplicates them (based on elements of the line) and
outputs the unique flows to file. The input file often contains over
300,000 lines of which about 25-30% are duplicates. The trouble I'm
having is that the program (which is intended to be long running) does
not seem to release any memory back to the system and in fact just
increases in memory footprint from iteration to iteration. It should
use about 150 MB by my estimates but sails through this and yesterday
slowed to a halt at about 1.6GB (due to the GC by my guess). This
makes no sense as at times I am deleting data structures that are 50MB
each which should show some decrease in memory usage.

The codebase is slightly to big too big to pastie but it is available
here http://svn.tobyclemson.co.uk/public/trunk/flow_de... .
There are actually only 2 classes of importance and 1 script but I
don't know if pastie can handle that.

Any help would be greatly appreciated as the alternative (pressures
from above) is to rewrite in Python (which involves me learning
Python)

Thanks in advance,
Toby Clemson

3 Answers

M. Edward (Ed) Borasky

8/8/2008 12:49:00 PM

On Fri, 2008-08-08 at 20:40 +0900, tobyclemson@gmail.com wrote:
> Hi all,
>
> I'm having a really odd memory problem with a small ruby program I've
> written. It basically takes in lines from input files (which represent
> router flows), deduplicates them (based on elements of the line) and
> outputs the unique flows to file. The input file often contains over
> 300,000 lines of which about 25-30% are duplicates. The trouble I'm
> having is that the program (which is intended to be long running) does
> not seem to release any memory back to the system and in fact just
> increases in memory footprint from iteration to iteration. It should
> use about 150 MB by my estimates but sails through this and yesterday
> slowed to a halt at about 1.6GB (due to the GC by my guess). This
> makes no sense as at times I am deleting data structures that are 50MB
> each which should show some decrease in memory usage.
>
> The codebase is slightly to big too big to pastie but it is available
> here http://svn.tobyclemson.co.uk/public/trunk/flow_de... .
> There are actually only 2 classes of importance and 1 script but I
> don't know if pastie can handle that.
>
> Any help would be greatly appreciated as the alternative (pressures
> from above) is to rewrite in Python (which involves me learning
> Python)
>
> Thanks in advance,
> Toby Clemson

Are you on a platform that has GNU "sort" available? GNU "sort" can do
the duplicate removal for you a *lot* more efficiently than a program in
*any* scripting language. Then you can use Ruby to do the "interesting"
part of the problem. :)
>
--
M. Edward (Ed) Borasky
ruby-perspectives.blogspot.com

"A mathematician is a machine for turning coffee into theorems." --
AlfrÃ©d RÃ©nyi via Paul ErdÅ?s

tobyclemson@gmail.com

8/8/2008 3:04:00 PM

Yes I am but I don't think sort will perform the required task. The
lines aren't identical, they just have similar fields and the
duplicates can span multiple files because the exported flows are
collected every minute so any duplicates occurring across minute
blocks would not be found. This is why I am using this buffer
approach.

Thanks for your help,
Toby

On Aug 8, 1:48=A0pm, "M. Edward (Ed) Borasky" <zn...@cesmail.net> wrote:
> On Fri, 2008-08-08 at 20:40 +0900, tobyclem...@gmail.com wrote:
> > Hi all,
>
> > I'm having a really odd memory problem with a small ruby program I've
> > written. It basically takes in lines from input files (which represent
> > router flows), deduplicates them (based on elements of the line) and
> > outputs the unique flows to file. The input file often contains over
> > 300,000 lines of which about 25-30% are duplicates. The trouble I'm
> > having is that the program (which is intended to be long running) does
> > not seem to release any memory back to the system and in fact just
> > increases in memory footprint from iteration to iteration. It should
> > use about 150 MB by my estimates but sails through this and yesterday
> > slowed to a halt at about 1.6GB (due to the GC by my guess). This
> > makes no sense as at times I am deleting data structures that are 50MB
> > each which should show some decrease in memory usage.
>
> > The codebase is slightly to big too big to pastie but it is available
> > herehttp://svn.tobyclemson.co.uk/public/trunk/flow_de....
> > There are actually only 2 classes of importance and 1 script but I
> > don't know if pastie can handle that.
>
> > Any help would be greatly appreciated as the alternative (pressures
> > from above) is to rewrite in Python (which involves me learning
> > Python)
>
> > Thanks in advance,
> > Toby Clemson
>
> Are you on a platform that has GNU "sort" available? GNU "sort" can do
> the duplicate removal for you a *lot* more efficiently than a program in
> *any* scripting language. Then you can use Ruby to do the "interesting"
> part of the problem. :)
>
> --
> M. Edward (Ed) Borasky
> ruby-perspectives.blogspot.com
>
> "A mathematician is a machine for turning coffee into theorems." --
> Alfr=E9d R=E9nyi via Paul Erd=F5s

Tim Crowley

1/27/2010 4:18:00 AM

On Jan 26, 2:33 pm, here@yomomma. (Obama Nation = Abomination) wrote:
>

> Does Obama support the Bill of Rights?

Yes, he does.

thanks for asking.

fuck, the frightened no name idiots get dumber and dumber.

comp.lang.ruby

BIG memory problem

tobyclemson@gmail.com

M. Edward (Ed) Borasky

tobyclemson@gmail.com

Tim Crowley

x Login to ForumsZone