Eric Mahurin
7/20/2007 2:49:00 AM
On 7/19/07, Ron M <rm_rails@cheapcomplexdevices.com> wrote:
> After I use "flatten" on a big array; the garbage collector
> seems to get pretty forgetful of what memory it's allowed
> to free up.
>
> In both of the following scripts, the first two lines
> create an array of many arrays; and then flatten them
> back into a flat array.
>
> In both cases, the memory after creating this array
> the scripts use about 20MB. After that, they
> repeatedly pack and unpack an array. The script that
> used "flatten" keeps growing quickly until it used
> about 1/4 GB of memory. The script that flattened
> the array by hand stays right about at 20MB.
>
> What could it be about flatten that makes the
> garbage collector fail to see that it could be
> re-using the memory of the pack/unpack loop?
>
>
> #########################################################################
> # This unexpectedly grows to 200+MB.
> #########################################################################
> a = (1..200000).map{|x| [x.to_s]};a.length
> b = a.flatten.map{|x| x.to_i} ; b.length
> puts `grep VmSize /proc/#{$$}/status` ######## under 20 MB
> (1..100).each{|x| b.pack("I*").unpack("I*");}
> puts `grep VmSize /proc/#{$$}/status` ######## WHOA GREW TO 1/4 GB.
>
>
> #########################################################################
> # This stays at 20MB
> #########################################################################
> a = (1..200000).map{|x| [x.to_s]};a.length
> b = a.map{|x| x[0].to_i} ; b.length
> puts `grep VmSize /proc/#{$$}/status` ######## under 20 MB
> (1..100).each{|x| b.pack("I*").unpack("I*");}
> puts `grep VmSize /proc/#{$$}/status` ######## still 20 MB
>
I isolated this a bit down by looking at the C code. The culprit in
Array#flatten seems to be rb_ary_splice which also gets called when
assigning an array slice (one form of Array#[]=). The culprit in
Array#pack seems to be rb_str_buf_cat which also gets called from
String#<<FixNum. I also found that you could even completely remove
references to the array you flattened/sliced and the problem still
persisted. A more fundamental test showing the problem would be this:
a = Array.new(200000) { |x| [x] }
a.size.times { |i| a[i,1] = a[i][0,1] } # calls rb_ary_splice,
triggering the problem
#a.size.times { |i| a[i] = a[i][0] } # calls rb_ary_store which seems OK
puts `grep VmSize /proc/#{$$}/status`
a = nil
GC.start
(1..200).each{ s="";200000.times{ s<<(?A) }} # calls rb_str_buf_cat
which "leaks"
#(1..200).each{ s="";200000.times{ s<<"A" }} # calls rb_str_append
which seems OK
puts `grep VmSize /proc/#{$$}/status`
GC.start
puts `grep VmSize /proc/#{$$}/status`
The amount of "leakage" seems to be about linear with respect to the
number of iterations in the last loop. Each string generated in the
loop should be 200K, but is not used again, so the memory increase due
to the last loop should be O(1) (around 200K independent of the number
of iterations). Also, the GC.start doesn't help.
If you use either of the alternatives, it seems to work.
This sure looks like a bug to me.
Eric