Asp Forum - JRuby disabling ObjectSpace: what implications?

Charles Oliver Nutter

10/28/2007 6:53:00 AM

As some of you may have heard, we're considering disabling
ObjectSpace.each_object by default in JRuby. Primarily, this is for
performance; to support each_object, we have to bend over backwards,
maintaining lists of weak references to all objects in the system and
periodically cleaning out those lists. Here's some example performance,
from a fractal benchmark in the JRuby source:

With ObjectSpace: Ruby Elapsed 45.967000
Without ObjectSpace: Ruby Elapsed 4.280000

What's most frustrating about this is that almost *no* libraries or apps
use each_object, and it's a terrible performance hit for us.

The one really visible use of each_object is in test/unit, where the
default console-based runner does each_object(Class) to find all
subclasses of TestCase. Because this is a heavily-used library (to say
the least), I've made modifications to JRuby to always support
each_object(Class) by maintaining a bidirectional graph of parent and
child classes. So that much wouldn't go away (but I'd prefer an
implementation that uses Class#inherited, since it would be cleaner,
faster, and deterministic).

So...I'm writing this to see what the general Ruby world thinks of us
having ObjectSpace disabled by default, enableable via a command line
option (or perhaps through a library? -robjectspace?).

I think more and more of you may want to give JRuby another look over
the next few months, so I think we need to involve you in such decisions.

- Charlie

27 Answers

Bill Kelly

10/28/2007 6:59:00 AM

From: "Charles Oliver Nutter" <charles.nutter@sun.com>
>
> As some of you may have heard, we're considering disabling
> ObjectSpace.each_object by default in JRuby. Primarily, this is for
> performance; to support each_object, we have to bend over backwards,
> maintaining lists of weak references to all objects in the system and
> periodically cleaning out those lists.

Is this also true for ObjectSpace#_id2ref ?

Regards,

Bill

Charles Oliver Nutter

10/28/2007 7:07:00 AM

ara.t.howard wrote:
> hmmm. ok i'm brainstorming here which you can ignore if you like as i
> know less that nothing about jvms or implementing ruby but here goes:
> what if you could invert the problem? what i objects knew about the
> global ObjectSpaceThang and could be forced to register themselves on
> demand somehow? without a reference i've no idea how, just throwing
> that out there. or, another stupid idea, what if the objects themselves
> were the tree/graph of weak references parent -> children. crawling it
> would be, um, fun - but you could prune dead objects *only* when walking
> the graph. this should be possible in ruby since you always have the
> notion of a parent object - which is Object - so all objects should be
> either reachable or leaks. now back to drinking my regularly scheduled
> beer...

Continuing this discussion here...

Please, continue to brainstorm. I don't claim to have thought out every
aspect of this problem or every possible solution. I'd *love* to
discover I've missed an obvious fix.

Your idea has come up in the past, and it would probably eliminate the
cost of an ObjectSpace list. However that doesn't appear to be where we
pay the highest cost.

The two items that (we believe) cost the most for us on the JVM are:

- Constructing an extra object for every Ruby object...namely, the
WeakReference object to point to it. So we pay a
memory/allocation/initialization cost.
- WeakReference itself causes Java's GC to have to do additional checks,
so it can notify the WeakReference that the object it points at has gone
away. So that slows down the legendary HotSpot GC and we pay again.

I believe the parent -> weakref -> children algorithm is used in some
implementations of ObjectSpace-like behavior, so it's perfectly valid.
But again, there's certain aspects of ObjectSpace that are just
problematic...

- threading or concurrency of any kind? No, you can't have
multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
potentially excludes other advanced GC designs too).
- determinism? Matz told me that "ObjectSpace doesn't have to be
deterministic"...but when it starts getting wired into libraries like
test/unit, it seems like people expect it to be. If we can say OS isn't
deterministic, then *nobody* should be relying in its contents for core
libraries, and we could reasonably claim that each_object will never
return *anything*.

- Charlie

Charles Oliver Nutter

10/28/2007 7:16:00 AM

Bill Kelly wrote:
>
> From: "Charles Oliver Nutter" <charles.nutter@sun.com>
>>
>> As some of you may have heard, we're considering disabling
>> ObjectSpace.each_object by default in JRuby. Primarily, this is for
>> performance; to support each_object, we have to bend over backwards,
>> maintaining lists of weak references to all objects in the system and
>> periodically cleaning out those lists.
>
> Is this also true for ObjectSpace#_id2ref ?

Not directly. _id2ref is handled in a similar way, but we have an event
we can trigger off to start tracking an object; namely, Object#id.

When you request an id, we start tracking that object for purposes of
_id2ref. Not until. So that would not be affected by disabling ObjectSpace.

In actually, however, _id2ref is primarily used for things like weak
references, so you can hold a virtual reference to an object without
preventing it from being collected. We could provide an implementation
of Ruby's weak references using Java's weak references that would allow
us to escape _id2ref entirely for that use case.

Are there other places _id2ref is used?

- Charlie

Bill Kelly

10/28/2007 7:59:00 AM

From: "Charles Oliver Nutter" <charles.nutter@sun.com>
> Bill Kelly wrote:
>>
>> Is this also true for ObjectSpace#_id2ref ?
>
> Not directly. _id2ref is handled in a similar way, but we have an event
> we can trigger off to start tracking an object; namely, Object#id.
>
> When you request an id, we start tracking that object for purposes of
> _id2ref. Not until. So that would not be affected by disabling ObjectSpace.

I see, thanks. Nifty. :)

> In actually, however, _id2ref is primarily used for things like weak
> references, so you can hold a virtual reference to an object without
> preventing it from being collected. We could provide an implementation
> of Ruby's weak references using Java's weak references that would allow
> us to escape _id2ref entirely for that use case.
>
> Are there other places _id2ref is used?

I think I've used _id2ref exactly twice. I can't recall the first
usage; I don't think it made it into production code. The most
recent use was to store some ruby object id's in a separate C++
process, which was able to fire an event back to ruby and provide
the object id for the object to receive the event.

(I suppose DRb might do something similar?)

Regards,

Bill

Nobuyoshi Nakada

10/28/2007 12:25:00 PM

Hi,

At Sun, 28 Oct 2007 16:16:25 +0900,
Charles Oliver Nutter wrote in [ruby-talk:276236]:
> Are there other places _id2ref is used?

drb.

--
Nobu Nakada

Robert Klemme

10/28/2007 1:07:00 PM

On 28.10.2007 08:06, Charles Oliver Nutter wrote:
> ara.t.howard wrote:
> > hmmm. ok i'm brainstorming here which you can ignore if you like as i
> > know less that nothing about jvms or implementing ruby but here goes:
> > what if you could invert the problem? what i objects knew about the
> > global ObjectSpaceThang and could be forced to register themselves on
> > demand somehow? without a reference i've no idea how, just throwing
> > that out there. or, another stupid idea, what if the objects themselves
> > were the tree/graph of weak references parent -> children. crawling it
> > would be, um, fun - but you could prune dead objects *only* when walking
> > the graph. this should be possible in ruby since you always have the
> > notion of a parent object - which is Object - so all objects should be
> > either reachable or leaks. now back to drinking my regularly scheduled
> > beer...
>
>
> Continuing this discussion here...
>
> Please, continue to brainstorm. I don't claim to have thought out every
> aspect of this problem or every possible solution. I'd *love* to
> discover I've missed an obvious fix.

IMHO ObjectSpace should not be implemented in Java land. Why? The JVM
has to keep track of instances anyway and implementing this in Java via
WeakReferences seems to duplicate functionality that is already there.
Did you consider using "Java Virtual Machine Tools Interface"?

http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt....

You could either follow the same approach of the heapTracker presented
on that page and use a flag or require a lib that enables ObjectSpace
(because of the overhead of instrumentation).

Alternatively there may be another method that does not need
instrumentation and that can give you access to every (reachable) object
in the JVM.

> Your idea has come up in the past, and it would probably eliminate the
> cost of an ObjectSpace list. However that doesn't appear to be where we
> pay the highest cost.
>
> The two items that (we believe) cost the most for us on the JVM are:
>
> - Constructing an extra object for every Ruby object...namely, the
> WeakReference object to point to it. So we pay a
> memory/allocation/initialization cost.
> - WeakReference itself causes Java's GC to have to do additional checks,
> so it can notify the WeakReference that the object it points at has gone
> away. So that slows down the legendary HotSpot GC and we pay again.
>
> I believe the parent -> weakref -> children algorithm is used in some
> implementations of ObjectSpace-like behavior, so it's perfectly valid.
> But again, there's certain aspects of ObjectSpace that are just
> problematic...
>
> - threading or concurrency of any kind? No, you can't have
> multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
> potentially excludes other advanced GC designs too).
> - determinism? Matz told me that "ObjectSpace doesn't have to be
> deterministic"...but when it starts getting wired into libraries like
> test/unit, it seems like people expect it to be. If we can say OS isn't
> deterministic, then *nobody* should be relying in its contents for core
> libraries, and we could reasonably claim that each_object will never
> return *anything*.

I'd reformulate the requirement here: ObjectSpace.each_object must yield
every object that was existent before the invocation and that is
strongly reachable. I believe for the typical use case (e.g. traversing
all class instances) this is enough while leaving enough flexibility for
the implementation (i.e. create s snapshot of some form, iterate through
some internal structure that may change due to new objects being created
during #each_object etc.).

Kind regards

robert

Daniel Berger

10/28/2007 1:14:00 PM

On Oct 28, 12:53 am, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:
<snip>

> So...I'm writing this to see what the general Ruby world thinks of us
> having ObjectSpace disabled by default, enableable via a command line
> option (or perhaps through a library? -robjectspace?).

ext\common\win32\registry.rb:569: ObjectSpace.define_finalizer
self, @@final.call(@hkeyfinal)
ext\dl\test\test.rb:187: ObjectSpace.define_finalizer(fp)
{File.unlink("tmp.txt")}
ext\tk\lib\multi-tk.rb:493: ObjectSpace.each_object(TclTkIp){|
obj|
ext\Win32API\lib\win32\registry.rb:569:
ObjectSpace.define_finalizer self, @@final.call(@hkeyfinal)
lib\cgi\session.rb:299: ObjectSpace::define_finalizer(self,
Session::callback(@dbprot))
lib\drb\drb.rb:337:# object's ObjectSpace id as its dRuby id. This
means that the dRuby
lib\drb\drb.rb:361: # This, the default implementation, uses an
object's local ObjectSpace
lib\drb\drb.rb:375: ObjectSpace._id2ref(ref)
lib\finalize.rb:59: ObjectSpace.call_finalizer(obj)
lib\finalize.rb:169: ObjectSpace.remove_finalizer(@proc)
lib\finalize.rb:173: ObjectSpace.add_finalizer(@proc)
lib\finalize.rb:180: # registering function to
ObjectSpace#add_finalizer
lib\finalize.rb:192: ObjectSpace.add_finalizer(@proc)
lib\irb\completion.rb:152: ObjectSpace.each_object(Module){|m|
lib\irb\ext\save-history.rb:69: ObjectSpace.define_finalizer(obj,
HistorySavingAbility.create_finalizer)
lib\shell\process-controller.rb:216: ObjectSpace.each_object(IO) do
|io|
lib\singleton.rb:23:# ObjectSpace.each_object(OtherKlass){} # =>
0.
lib\singleton.rb:190: "#{ObjectSpace.each_object(klass){}} #{klass}
instance(s)"
lib\tempfile.rb:53: ObjectSpace.define_finalizer(self, @clean_proc)
lib\tempfile.rb:105: ObjectSpace.undefine_finalizer(self)
lib\tempfile.rb:118: ObjectSpace.undefine_finalizer(self)
lib\test\unit\autorunner.rb:17: ObjectSpace.each_object(Class)
do |klass|
lib\test\unit\autorunner.rb:54: :objectspace => proc do |r|
lib\test\unit\autorunner.rb:55: require 'test/unit/collector/
objectspace'
lib\test\unit\autorunner.rb:56: c =
Collector::ObjectSpace.new
lib\test\unit\autorunner.rb:80: @collector =
COLLECTORS[(standalone ? :dir : :objectspace)]
lib\test\unit\collector\dir.rb:13: def initialize(dir=::Dir,
file=::File, object_space=::ObjectSpace, req=nil)
lib\test\unit\collector\objectspace.rb:10: class ObjectSpace
lib\test\unit\collector\objectspace.rb:13: NAME = 'collected
from the ObjectSpace'
lib\test\unit\collector\objectspace.rb:15: def
initialize(source=::ObjectSpace)
lib\test\unit.rb:252: # the ObjectSpace and wrap them up into a suite
for you. It then runs
lib\weakref.rb:16:# ObjectSpace.garbage_collect
lib\weakref.rb:62: ObjectSpace._id2ref(@__id)
lib\weakref.rb:74: ObjectSpace.define_finalizer obj, @@final
lib\weakref.rb:75: ObjectSpace.define_finalizer self, @@final
lib\weakref.rb:98: ObjectSpace.garbage_collect
test\dbm\test_dbm.rb:45: ObjectSpace.each_object(DBM) do |obj|
test\gdbm\test_gdbm.rb:42: ObjectSpace.each_object(GDBM) do |obj|
test\ruby\test_objectspace.rb:3:class TestObjectSpace <
Test::Unit::TestCase
test\ruby\test_objectspace.rb:10: o =
ObjectSpace._id2ref(obj.object_id);test\sdbm\test_sdbm.rb:15: ObjectSpace.each_object(SDBM) do |obj|
test\testunit\collector\test_dir.rb:62: class ObjectSpace
test\testunit\collector\test_dir.rb:81: @object_space =
ObjectSpace.new
test\testunit\collector\test_objectspace.rb:6:require 'test/unit/
collector/objectspace'
test\testunit\collector\test_objectspace.rb:11: class
TC_ObjectSpace < TestCase
test\testunit\collector\test_objectspace.rb:41: @c =
ObjectSpace.new(@object_space)
test\testunit\collector\test_objectspace.rb:44: def
full_suite(name=ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:51:
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:83: expected =
TestSuite.new(ObjectSpace::NAME)
test\testunit\collector\test_objectspace.rb:89: expected =
TestSuite.new(ObjectSpace::NAME)
test\yaml\test_yaml.rb:1279: ObjectSpace.each_object(Class) do |
klass|

So, in summary, if we exclude those libraries where only tests are
affected, this would affect:

win32-registry
tk
cgi
drb
finalize
irb
shell
singleton
tempfile
test-unit
weakref

Some comments on each of these as they relate to JRuby:

win32-registry: You have no hope of implementing this without JNA
anyway, unless there's some Java binding I don't know about. Besides,
I couldn't tell you why on Earth win32-registry would need a
finalizer.

tk: No one will care. They'll use SWT or Swing bindings. Besides, you
would need JNA.

cgi: This could be a problem. Then again, some people say this library
should be refactored or tossed.

drb: This could be a big deal.

finalize: Did anyone even know about this? Does anyone use it?

irb: You've got jirb.

shell: This could be a problem.

singleton: Ditto.

tempfile: Meh, I'm guessing Java has its own library for temp files. I
never liked the current implementation anyway (which is why I wrote
file-temp).

test-unit: Already mentioned.

weakref: You've stated that Java has its own implementation.

Regards,

Dan

ara.t.howard

10/28/2007 2:27:00 PM

On Oct 28, 2007, at 1:16 AM, Charles Oliver Nutter wrote:

>
> Are there other places _id2ref is used?

i use it quite often as a way to have meta-programming 'storage'
without polluting instances:

foo = method :foo

module_eval <<-code
def foo(*a, &b)
ObjectSpace._id2ref(#{ foo.id }).bind(self).call(*a, &b)
end
code

which is fabricated - but you get the concept: string in eval maps to
live object at run time. when #define_method takes a block this
won't be used much i think though...

cheers.

a @ http://codeforp...
--
it is not enough to be compassionate. you must act.
h.h. the 14th dalai lama

Charles Oliver Nutter

10/28/2007 4:10:00 PM

Bill Kelly wrote:
> I think I've used _id2ref exactly twice. I can't recall the first
> usage; I don't think it made it into production code. The most
> recent use was to store some ruby object id's in a separate C++
> process, which was able to fire an event back to ruby and provide
> the object id for the object to receive the event.
>
> (I suppose DRb might do something similar?)

Yeah, sounds like that's mostly a "poor man's remote hash". I'd expect
that just creating a hash specifically for that purpose and passing a
key around would be a "better" way to do it.

_id2ref is just another one of those features that gets rarely used, and
whose use cases can often be implemented in "better" ways.

- Charlie

Charles Oliver Nutter

10/28/2007 4:19:00 PM

Robert Klemme wrote:
> IMHO ObjectSpace should not be implemented in Java land. Why? The JVM
> has to keep track of instances anyway and implementing this in Java via
> WeakReferences seems to duplicate functionality that is already there.
> Did you consider using "Java Virtual Machine Tools Interface"?
>
> http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt....
>
> You could either follow the same approach of the heapTracker presented
> on that page and use a flag or require a lib that enables ObjectSpace
> (because of the overhead of instrumentation).

You just hit on exactly why we don't use JVMTI for ObjectSpace. It would
certainly work, but it would add a lot of overhead we'd never expect
people to accept in a real application. Plus, it would track far more
object instances than we actually want tracked. We'd love to include a
JVMTI-based ObjectSpace implementation, however...it just hasn't been a
high priority to implement since 99% of users never actually need
ObjectSpace.

> Alternatively there may be another method that does not need
> instrumentation and that can give you access to every (reachable) object
> in the JVM.

If there is...we haven't found it. The "linked weakref list" has been
the least overhead so far, and it's still a lot of overhead.

>> Your idea has come up in the past, and it would probably eliminate the
>> cost of an ObjectSpace list. However that doesn't appear to be where
>> we pay the highest cost.
>>
>> The two items that (we believe) cost the most for us on the JVM are:
>>
>> - Constructing an extra object for every Ruby object...namely, the
>> WeakReference object to point to it. So we pay a
>> memory/allocation/initialization cost.
>> - WeakReference itself causes Java's GC to have to do additional
>> checks, so it can notify the WeakReference that the object it points
>> at has gone away. So that slows down the legendary HotSpot GC and we
>> pay again.
>>
>> I believe the parent -> weakref -> children algorithm is used in some
>> implementations of ObjectSpace-like behavior, so it's perfectly valid.
>> But again, there's certain aspects of ObjectSpace that are just
>> problematic...
>>
>> - threading or concurrency of any kind? No, you can't have
>> multithreading with ObjectSpace, nor a concurrent/parallel GC (and it
>> potentially excludes other advanced GC designs too).
>> - determinism? Matz told me that "ObjectSpace doesn't have to be
>> deterministic"...but when it starts getting wired into libraries like
>> test/unit, it seems like people expect it to be. If we can say OS
>> isn't deterministic, then *nobody* should be relying in its contents
>> for core libraries, and we could reasonably claim that each_object
>> will never return *anything*.
>
> I'd reformulate the requirement here: ObjectSpace.each_object must yield
> every object that was existent before the invocation and that is
> strongly reachable. I believe for the typical use case (e.g. traversing
> all class instances) this is enough while leaving enough flexibility for
> the implementation (i.e. create s snapshot of some form, iterate through
> some internal structure that may change due to new objects being created
> during #each_object etc.).

The problem here is "strongly reachable". During ObjectSpace processing,
the last strong reference to an object may go away and the garbage
collector may run. Should ObjectSpace prevent GC from running if it's
traversed and now references that object? If not, how should it be
handled if immediately before you return an object from each_object, it
gets garbage collected? There's no way to catch that, so each_object may
end up returning a reference to an object that's gone away, or
reconstituting an object whose finalization has already fired. Bad
things happen.

ObjectSpace is just not compatible with any GC that requires the ability
to move objects around in memory, run in parallel, and so on. It can
*never* be deterministic unless it can "stop the world", so it should
not be used for algorithms that require any level of determinism, such
as the test search in test/unit.

- Charlie

comp.lang.ruby

JRuby disabling ObjectSpace: what implications?

Charles Oliver Nutter

Bill Kelly

Charles Oliver Nutter

Charles Oliver Nutter

Bill Kelly

Nobuyoshi Nakada

Robert Klemme

Daniel Berger

ara.t.howard

Charles Oliver Nutter

Charles Oliver Nutter

x Login to ForumsZone