Gregory Brown
8/12/2007 2:58:00 PM
On 8/12/07, Lionel Bouton <lionel-subscription@bouton.name> wrote:
> John Joyce wrote the following on 12.08.2007 07:47 :
> > The OP DID mention [long running Rails processes] !
> > OP Quote:
> > "I immediately thought of Ruby because of my experience with long running
> > Rails processes. It might be the occasion to have my gut feelings
> > checked by people that really know the inner workings of Ruby 1.8 too."
> >
>
> To be more accurate on my experience: I have a Rails application with
> the usual CRUD behaviour which isn't especially memory intensive
> (processes happily sit around 30-40MB, which seems the minimum for a
> Rails application). But there are some actions that process incoming CSV
> files with a rather bad memory behaviour. The process size jumps as soon
> as these actions are called with a size roughly proportionnal to the CSV
> line count.
Are you loading the CSVs entirely into memory or processing them line
by line? With a large CSV, if you process it line by line (even if
you're going to ultimately store it), you're less likely to hit the
same kind of memory spike you'd get loading it entirely into memory.
This is a convoluted example, but in Ruport, this code, which
ultimately breaks down a table of records into a simple array of
arrays takes a lot of memory (110mb for 50000 lines ):
a = Table("hygjan2007.csv").map { |r| r.to_a }
Where this one that does the conversion on the fly takes a whole lot less(67mb):
>> a = []
=> []
>> Table("hygjan2007.csv", :records => true) do |t,r|
?> a << r.to_a
>> end
Of course, this isn't a practical Ruport example, I'm intentionally
ballooning the memory by doing the unnecessary record conversion just
to use primitive objects for the ultimate storage (an array of
arrays).
Now, I'm pretty sure since the end result of those two code samples
are the same, that when it was time to free up that memory in a
crunch, they'd compress down to the same size. But if you don't want
it to spike in the first place, anywhere you can introduce row
processing for your CSVs instead of slurping would be a good thing.
If you're already doing that, this advice isn't very helpful to you,
but may be to others.
-greg