Ilmari Heikkinen
1/22/2006 2:13:00 PM
So there I was this morning, staring at an ObjectSpace counter tellingme that I'm allocating 1500 Arrays and 10000 Floats per frame. Whichpretty much ground my framerate to ground by requiring a 0.2s GC runevery other frame. So I decided to get down and rid my code of as manyallocations as possible.The first thing I discovered was a bit of code looking like this (notto mention that each was actually getting called several times perframe due to a bug):@mtime = ([@mtime] + ([@stroke||nil,@fill||nil].compact+children).map{|c| c.mtime}).maxQuite unreadable, and it was responsible for a large part of the Arrayallocations too. A quick change whittled Array allocation count forthat method to 0, with the price of making it less idiomatic:children.each{|c| cm = c.mtime @mtime = cm if cm > @mtime}if @stroke sm = @stroke.mtime @mtime = sm if sm > @mtimeendif @fill fm = @fill.mtime @mtime = fm if fm > @mtimeendNow the Array allocations dropped down to hundreds, a much morereasonable number, but still way too much compared to what washappening in the frame. The only thing that should've changed was onenumber. So the extra 500 Arrays were a bit of a mystery.Some investigation revealed places where I was usingArray#each_with_index. Very nice, very idiomatic, very allocating anew Array on each iteration. So replace by the following and watch thealloc counts fall:i = 0arr.each{|e| do_stuff_with e i += 1}By doing that in a couple of strategic places and some otheroptimizations, the Array allocation count fell to 150. Of which 90were allocated in the object Z-sorting method, which'd require a Cimplementation to get its allocation count to 0. The Array allocationfight was heading towards diminishing returns, and my current scenedidn't need to use Z-sorting, so I turned my attention to the Floats.By now, the Float count had also dropped a great deal, but it wasstill a hefty 3000 Floats per frame. With each float weighing 16bytes, that was nearly 3MB per second when running at 60fps. Searchingfor the method that was allocating all those Floats, i ran intosomething weird. #transform was allocating 6-32 Floats per call. Andit's one of the functions that get called for every scene object, inevery frame. Also, it's written in C.That left me stymied. Surely there must be some mistake, I thought,the C function didn't seem to be allocating _any_ Ruby objects. Butlittle did I know.The C function called the NUM2DBL-macro in several places to turn Rubynumbers into doubles. Reading the source for NUM2DBL told that itcalls the rb_num2dbl C function. Which takes a Ruby number and returnsa C double. Reading the source to rb_num2dbl revealed this:01361 double01362 rb_num2dbl(val)01363 VALUE val;01364 {01365 switch (TYPE(val)) {01366 case T_FLOAT:01367 return RFLOAT(val)->value;0136801369 case T_STRING:01370 rb_raise(rb_eTypeError, "no implicit conversion to floatfrom string");01371 break;0137201373 case T_NIL:01374 rb_raise(rb_eTypeError, "no implicit conversion to floatfrom nil");01375 break;0137601377 default:01378 break;01379 }0138001381 return RFLOAT(rb_Float(val))->value;01382 }rb_Float gets called on all Fixnums and Bignums, which there happenedto be quite a deal of in my scene state arrays. Checking out rb_Floatgave the explanation for the Float allocations:01326 switch (TYPE(val)) {01327 case T_FIXNUM:01328 return rb_float_new((double)FIX2LONG(val));0132901333 case T_BIGNUM:01334 return rb_float_new(rb_big2dbl(val));In order to turn a Fixnum into a double, it's allocating a new Float!With that figured out, I took and rewrote rb_num2dbl as rb_num_to_dbl,this time handling Fixnums and Bignums as special cases as well:double rb_num_to_dbl( VALUE val ){ switch (TYPE(val)) { case T_FLOAT: return RFLOAT(val)->value; case T_FIXNUM: return (double)FIX2LONG(val); case T_BIGNUM: return rb_big2dbl(val); case T_STRING: rb_raise(rb_eTypeError, "no implicit conversion to float from string"); break; case T_NIL: rb_raise(rb_eTypeError, "no implicit conversion to float from nil"); break; default: break; } return RFLOAT(rb_Float(val))->value;}The result? Float allocations fell to 700 per frame from the original3000. And now I'm getting a GC run "only" every 36 frames. Not perfectby any means, but a decent start.Have stories of your own? Tips for memory management? Ways to trackallocations? Post them, please.Cheers,Ilmari