[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: TupleSpace performance (TupleBag really

Ryan Davis

2/9/2007 12:51:00 AM


On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:

> We've been struggling with this problem for months. We use
> TupleSpace to
> implement a distributed processing framework and periodically if
> the number
> of objects
> in the TupleBag gets large (not ridiculous but 50,000 or so) the
> TupleSpace
> begins to take 100pct cpu and is effectively neutered and must be
> restarted.
> We've been trying to come up
> with a better implementation of TupleBag but have not had much luck
> so far.
> Has anyone else done this?

What's happening is that you've populated a hash enough that you've
saturated the bins. You're now hitting a lot of hash collisions. You
can either improve the hash method on the tuples (possibly... really
depends on what you're storing), or switch the underlying data
structure (maybe a balanced tree will suit you better at large tuple
populations).

Or, if you're using patterns ala Gelernter, you might be able to pre-
partition your spaces into multiple bags based on activities or
specialists (or job or task type). Sounds like you've probably
already tried that tho.


3 Answers

Alex Young

2/9/2007 9:30:00 AM

0

Ryan Davis wrote:
>
> On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:
>
>> We've been struggling with this problem for months. We use TupleSpace to
>> implement a distributed processing framework and periodically if the
>> number
>> of objects
>> in the TupleBag gets large (not ridiculous but 50,000 or so) the
>> TupleSpace
>> begins to take 100pct cpu and is effectively neutered and must be
>> restarted.
>> We've been trying to come up
>> with a better implementation of TupleBag but have not had much luck so
>> far.
>> Has anyone else done this?
>
> What's happening is that you've populated a hash enough that you've
> saturated the bins. You're now hitting a lot of hash collisions.
Do Ruby's hashes not extend themselves once they hit a certain
saturation? Is there a hard limit at play here? Or is it simply a
matter of the hash function not distributing keys well?

--
Alex

Mauricio Fernández

2/9/2007 10:50:00 AM

0

On Fri, Feb 09, 2007 at 09:50:34AM +0900, Ryan Davis wrote:
> On Feb 7, 2007, at 9:21 AM, Mark Alexander Friedgan wrote:
>
> >We've been struggling with this problem for months. We use TupleSpace to
> >implement a distributed processing framework and periodically if the
> >number of objects in the TupleBag gets large (not ridiculous but 50,000 or
> >so) the TupleSpace begins to take 100pct cpu and is effectively neutered
> >and must be restarted. We've been trying to come up with a better
> >implementation of TupleBag but have not had much luck so far. Has anyone
> >else done this?
>
> What's happening is that you've populated a hash enough that you've
> saturated the bins. You're now hitting a lot of hash collisions. You
> can either improve the hash method on the tuples (possibly... really
> depends on what you're storing), or switch the underlying data
> structure (maybe a balanced tree will suit you better at large tuple
> populations).

Are you sure? In Ruby, the number of bins is increased (exponentially) when
there are more than 5 entries per bin...

Actually, in a TupleBag tuples are indexed by their size and stored in an
array. Finding a tuple matching a given template in that array is O(n)

##
# Finds a live tuple that matches +template+.

def find(template)
@hash.fetch(template.size, []).find do |tuple|
tuple.alive? && template.match(tuple)
end
end

If you want to make TupleSpace perform better, you'll have to make that
operation (finding a live tuple matching an arbitrary template) faster.

[Of course, the TupleSpace behaves much better when it gets the read request
_before_ the tuple, since it just has to scan the list of templates waiting
for a tuple.]

--
Mauricio Fernandez - http://eige... - singular Ruby
** Latest postings **
What's new in Ruby 1.9, Feb. 07 update
http://eige.../hiki.rb?Changes-in-Ruby-1.9-update-6
Firebrigade: automated, sandboxed testing of RubyGems packages by other devels
http://eige.../hiki.rb?firebrigade-launched

Mauricio Fernández

2/9/2007 10:58:00 AM

0

On Fri, Feb 09, 2007 at 06:29:48PM +0900, Alex Young wrote:
> Ryan Davis wrote:
> >What's happening is that you've populated a hash enough that you've
> >saturated the bins. You're now hitting a lot of hash collisions.
> Do Ruby's hashes not extend themselves once they hit a certain
> saturation? Is there a hard limit at play here?

The number of bins is doubled when there are more than 5 entries per bin
(avg.).

> Or is it simply a matter of the hash function not distributing keys well?

It's a matter of the tuples being indexed by their sizes, and performing O(n)
matching over all the tuples of a given size to find one matching a given
template (see my other message).

--
Mauricio Fernandez - http://eige... - singular Ruby
** Latest postings **
What's new in Ruby 1.9, Feb. 07 update
http://eige.../hiki.rb?Changes-in-Ruby-1.9-update-6
Firebrigade: automated, sandboxed testing of RubyGems packages by other devels
http://eige.../hiki.rb?firebrigade-launched