Asp Forum - "Crystallizing" Objects

enduro

5/17/2007 9:55:00 AM

Hello,

in this post, I'd like to make a proposal to contribute
to Ruby's fitness for the challenges of parallelization.

The problem with sharing objects between threads
is concurrent write access.
If the objects were frozen,
everything would be OK ... but also boring. :)

Now, I think it should be possible
to freeze the "already existing" data of an object
and at the same time leave it open for "further growth".
Especially, I have Hashes and Arrays in mind:
New key/value pairs could be added to Hashes,
objects could be appended to Arrays.

So, I think it would be beneficial to introduce a third state of objects
between frozen and normal, like "growing" or "crystallizing" or
"monotonous" or "incrementally_freezing" or "partially_frozen" ...

In this state, all methods that would
modify the already "crystallized" data of the object
should throw an exception.
E.g. for strings, only appending would be allowed.

Perhaps it would help implementing this idea,
if the << method would be made special
and be the only accepted modifying method
in the "growing" state.

Also, it would probably be good to have a means of determining
if a method call is safe to use, in the sense
that its result will always be the same object,
regardless of what other threads do to the receiver.
(Let's say, it is a "permanent" method call)

Roughly:
- when the receiving object is frozen, all method calls are "permanent"

- when the receiver is in the normal (unrestricted) state,
method calls are "permanent", iff the receiver is an immediate value.

- when the object is in the "growing" state, then it's a bit tricky:

Just having an array "permanent_methods" would not always work,
because there are methods that are both,
"permanent" or not, depending on the arguments.

For example, we would definitely want to use Array#[],
but this is only "permanent" for non-negative indices.

Perhaps, there could be some annotation to those methods,
(raising an exception if the call would not guarantee a permanent result),
and this would only be switched on if the object is in the
"crystallizing" state
(and the general check for "permanentness" were switched on, say
globally).

OK, let me stop here, for now.
Has a similar idea been discussed before?

Regards,
Sven

21 Answers

enduro

5/17/2007 12:02:00 PM

Sven Suska schrieb:

> to freeze the "already existing" data of an object
> and at the same time leave it open for "further growth".

> So, I think it would be beneficial to introduce a third state of objects
> between frozen and normal, like "growing" or "crystallizing" or
> "monotonous" or "incrementally_freezing" or "partially_frozen" ...

I'd better give a use case, to make clear what I mean:
Assume, we have a Hash called "users".

If it were just a normal hash, we would have to lock the object manually:

if users.has_key(new_nickname)
raise "Nickname already exists"
end
# hash could be modified here, if not locked!!!
# if another thread would have stored a user with the same nickname
# just now, then this data would be overwritten in the next statement.
users[new_nickname] = user_data

If "users" were in this "partially_frozen" state, then we could write:

begin
users[new_nickname] = user_data
rescue CantChangeExistingDataError
raise "Nickname already exists"
end

So, the prime use case is the assignment of hash keys.
Perhaps I was too quick generalizing this approach to all objects.

> Also, it would probably be good to have a means of determining
> if a method call is safe to use, in the sense
> that its result will always be the same object,
> regardless of what other threads do to the receiver.
> (Let's say, it is a "permanent" method call)

"reliable" would probably be a better name for that.

And I think that this kind of check is not nessesary for my proposal to
work.

Regards,
Sven

Robert Dober

5/17/2007 12:24:00 PM

On 5/17/07, Sven Suska <sven715rt@suska.org> wrote:
> Hello,
>
Interesting idea Sven, I lose you on some points though

> in this post, I'd like to make a proposal to contribute
> to Ruby's fitness for the challenges of parallelization.
>
> The problem with sharing objects between threads
> is concurrent write access.
> If the objects were frozen,
> everything would be OK ... but also boring. :)
>
> Now, I think it should be possible
> to freeze the "already existing" data of an object
> and at the same time leave it open for "further growth".
> Especially, I have Hashes and Arrays in mind:
> New key/value pairs could be added to Hashes,
> objects could be appended to Arrays.
>
> So, I think it would be beneficial to introduce a third state of objects
> between frozen and normal, like "growing" or "crystallizing" or
> "monotonous" or "incrementally_freezing" or "partially_frozen" ...
>
> In this state, all methods that would
> modify the already "crystallized" data of the object
> should throw an exception.
> E.g. for strings, only appending would be allowed.
>
> Perhaps it would help implementing this idea,
> if the << method would be made special
> and be the only accepted modifying method
> in the "growing" state.

Hmm maybe that has something to do with your "permanent" methods, of
which I fail to grasp the idea right now.
Assuming that << is the only writing method to a semi-frozen object we
would need to freeze the methods on the object, like pulling them into
the object and disallow method lookup. Maybe it is not the best idea
to control the state at that global level?
>
> Also, it would probably be good to have a means of determining
> if a method call is safe to use, in the sense
> that its result will always be the same object,
You mean that the receiver will not be modified, right?
> regardless of what other threads do to the receiver.
> (Let's say, it is a "permanent" method call)
Non modifiying?
>
> Roughly:
> - when the receiving object is frozen, all method calls are "permanent"
Yes I guess I see what you mean with "permanent"
>
> - when the receiver is in the normal (unrestricted) state,
> method calls are "permanent", iff the receiver is an immediate value.
Nope I am losing you, there are still non modifying methods as e.g.
to_s so I guess I am wrong in my understanding.
>
> - when the object is in the "growing" state, then it's a bit tricky:
>
> Just having an array "permanent_methods" would not always work,
> because there are methods that are both,
> "permanent" or not, depending on the arguments.
Would you mind to elaborate?
>
> For example, we would definitely want to use Array#[],
> but this is only "permanent" for non-negative indices.
You mean because -1 would be different because of growing might occur
in a different thread. In that case permanent would mean insensible to
growth?
>
> Perhaps, there could be some annotation to those methods,
> (raising an exception if the call would not guarantee a permanent result),
I see, makes sense.
> and this would only be switched on if the object is in the
> "crystallizing" state
> (and the general check for "permanentness" were switched on, say
> globally).
>
> OK, let me stop here, for now.
> Has a similar idea been discussed before?
Not to my memories, but I hope this will become a rich one:)
>
>
> Regards,
> Sven
>
Cheers
Robert
>

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

enduro

5/17/2007 1:38:00 PM

Hi Robert,

Robert Dober schrieb:

>> Perhaps it would help implementing this idea,
>> if the << method would be made special
>> and be the only accepted modifying method
>> in the "growing" state.
>
> Hmm maybe that has something to do with your "permanent" methods, of
> which I fail to grasp the idea right now.

No, nothing to do with that.
The idea was meant as a quick solution:
If an object is in "crystallizing" or "semi_frozen" state,
then react to every method call, as if it were frozen, except to
the << method.

I had meant this just as an easy default strategy for new classes,
I still would like to be able to use the normal methods for built-in
objects.
(In particular, I would want to write
hash[key] = value
instead of a newly introduced
hash << [key,value]
)

> Assuming that << is the only writing method to a semi-frozen object we
> would need to freeze the methods on the object, like pulling them into
> the object and disallow method lookup.

Yes, for example. I think, however, that it would be better to raise
a special exception if destructive methods are called,
but I don't have the picture clear enough to be able to tell.

> Maybe it is not the best idea
> to control the state at that global level?

Global level? What are you referring to?

>> Also, it would probably be good to have a means of determining
>> if a method call is safe to use, in the sense
>> that its result will always be the same object,
>
> You mean that the receiver will not be modified, right?

Well, not quite -- that's why I was struggeling for a new word for it.
The receiver can very well be modified, but only in a non-destructive way:
For example
hash[key]
will always return the same object, no matter how many other
key/value-pairs
will be added to hash.
So, I would say Hash#[] is always "permanent" (or "reliable" in my
second post).
On the other hand, Hash#sort and Hash#size are not "permanent" (or
"reliable").

>> - when the receiver is in the normal (unrestricted) state,
>> method calls are "permanent", iff the receiver is an immediate value.
>
> Nope I am losing you, there are still non modifying methods as e.g.
> to_s so I guess I am wrong in my understanding.

Well, unless it is an immediate value, there is Object#replace,
and thus the object can be changed completely be another thread.
And thus, to_s will most certainly not be "reliable" or "permanent".
But we _can_ rely on Symbol#to_s and Fixnum#to_s, because immediate values
cannot be changed. (It is as if they are always frozen.)

>> For example, we would definitely want to use Array#[],
>> but this is only "permanent" for non-negative indices.
>
> You mean because -1 would be different because of growing might occur
> in a different thread. In that case permanent would mean insensible to
> growth?

Yes, I think you were able to put it better than I.
"permanent" / "reliable" = "insensible to anything that may happen,
including growth"

And, so far we have not said anything about the resulting object itself.
One option would be to demand that the result must also (at least) be
"semi_frozen".
For example, that a "semi_frozen" hash must only contain frozen and
semi-frozen values.
But then, this restriction might be too strong, sometimes.
I think this is a minor issue at this point.

So long

Sven

Robert Dober

5/17/2007 5:47:00 PM

On 5/17/07, Sven Suska <sven715rt@suska.org> wrote:
> Hi Robert,
>
> Robert Dober schrieb:
>
> >> Perhaps it would help implementing this idea,
> >> if the << method would be made special
> >> and be the only accepted modifying method
> >> in the "growing" state.
> >
> > Hmm maybe that has something to do with your "permanent" methods, of
> > which I fail to grasp the idea right now.
>
> No, nothing to do with that.
> The idea was meant as a quick solution:
> If an object is in "crystallizing" or "semi_frozen" state,
> then react to every method call, as if it were frozen, except to
> the << method.
>
> I had meant this just as an easy default strategy for new classes,
> I still would like to be able to use the normal methods for built-in
> objects.
> (In particular, I would want to write
> hash[key] = value
> instead of a newly introduced
> hash << [key,value]
> )
>
>
> > Assuming that << is the only writing method to a semi-frozen object we
> > would need to freeze the methods on the object, like pulling them into
> > the object and disallow method lookup.
>
> Yes, for example. I think, however, that it would be better to raise
> a special exception if destructive methods are called,
> but I don't have the picture clear enough to be able to tell.
>
> > Maybe it is not the best idea
> > to control the state at that global level?
>
> Global level? What are you referring to?
Exactly what you described above, raise an Exception as soon as an
illegal operation on the object is performed; not depending on the
method.
Global was a bad name.
>
>
>
> >> Also, it would probably be good to have a means of determining
> >> if a method call is safe to use, in the sense
> >> that its result will always be the same object,
> >
> > You mean that the receiver will not be modified, right?
>
> Well, not quite -- that's why I was struggeling for a new word for it.
> The receiver can very well be modified, but only in a non-destructive way:
Thats what I should have said, the concept is new and I find it
difficult to find the correct wordings.
> For example
> hash[key]
> will always return the same object, no matter how many other
> key/value-pairs
> will be added to hash.
Agreed.
> So, I would say Hash#[] is always "permanent" (or "reliable" in my
> second post).
Ok, I see clearer now maybe constant might be a better word, but that
might be confusing for C++ folks.
> On the other hand, Hash#sort and Hash#size are not "permanent" (or
> "reliable").
Clear now.
>
>
> >> - when the receiver is in the normal (unrestricted) state,
> >> method calls are "permanent", iff the receiver is an immediate value.
> >
> > Nope I am losing you, there are still non modifying methods as e.g.
> > to_s so I guess I am wrong in my understanding.
>
> Well, unless it is an immediate value, there is Object#replace,
> and thus the object can be changed completely be another thread.
> And thus, to_s will most certainly not be "reliable" or "permanent".
> But we _can_ rely on Symbol#to_s and Fixnum#to_s, because immediate values
> cannot be changed. (It is as if they are always frozen.)
Aha sorry I think talking about the return value is not relevant or do
I still misunderstand?
>
>
> >> For example, we would definitely want to use Array#[],
> >> but this is only "permanent" for non-negative indices.
> >
> > You mean because -1 would be different because of growing might occur
> > in a different thread. In that case permanent would mean insensible to
> > growth?
>
> Yes, I think you were able to put it better than I.
> "permanent" / "reliable" = "insensible to anything that may happen,
> including growth"
Great, I really see what is permanent now.
>
> And, so far we have not said anything about the resulting object itself.
> One option would be to demand that the result must also (at least) be
> "semi_frozen".
Hmmm gotta think I see no reason so far for that!
> For example, that a "semi_frozen" hash must only contain frozen and
> semi-frozen values.
Sure but that is containment not return values?
> But then, this restriction might be too strong, sometimes.
> I think this is a minor issue at this point.
All this is open to investigation, you just introduced a concept...
>
>
> So long
>
> Sven
>
>
Cheers
Robert

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

Ryan Davis

5/17/2007 6:12:00 PM

On May 17, 2007, at 02:55 , Sven Suska wrote:

> Hello,
>
> in this post, I'd like to make a proposal to contribute
> to Ruby's fitness for the challenges of parallelization.
>
> The problem with sharing objects between threads
> is concurrent write access.
> If the objects were frozen,
> everything would be OK ... but also boring. :)
>
> Now, I think it should be possible
> to freeze the "already existing" data of an object
> and at the same time leave it open for "further growth".
> Especially, I have Hashes and Arrays in mind:
> New key/value pairs could be added to Hashes,
> objects could be appended to Arrays.

[...]

To get some clarification and help me understand better, how is this
any different from purely functional data structures?

http://en.wikipedia.org/wiki/Purely_functional_data...

Yohanes Santoso

5/17/2007 7:01:00 PM

Sven Suska <sven715rt@suska.org> writes:

> Sven Suska schrieb:
>
> If it were just a normal hash, we would have to lock the object manually:
>

Start locking from this point, before you do a check.

> if users.has_key(new_nickname)
> raise "Nickname already exists"
> end
> # hash could be modified here, if not locked!!!
> # if another thread would have stored a user with the same nickname
> # just now, then this data would be overwritten in the next
> statement.

If you start locking at this point, then in between the check and the
insertion another check and insertion could have occurred and the
following insertion effectively undoes that insertion.

> users[new_nickname] = user_data

> If "users" were in this "partially_frozen" state, then we could write:
>
> begin
> users[new_nickname] = user_data
> rescue CantChangeExistingDataError
> raise "Nickname already exists"
> end

For Hash in official ruby implementation, since #= is implemented in C
without any yielding, #= is effectively an atomic operation. But since
you want to throw an exception if there is already existing entry, you'd
still need a lock to do the check-and-insert atomically.

In another thread, Ryan Davis brought up purely functional data
structure. However, that won't lift the necessity of locking in your
example case because the users list would have to be a global, shared
resource and you are trying to do something akin to the P() operation
of sempahore. Unless Hash already have an atomic check-before-set
operation, you can't get around it the need of having a lock. [And no,
Hash has no such thing yet. Set#add? is similar to what you're looking
for but it's not atomic].

YS.

Robert Dober

5/17/2007 7:31:00 PM

On 5/17/07, Yohanes Santoso <ysantoso-rubytalk@dessyku.is-a-geek.org> wrote:
> Sven Suska <sven715rt@suska.org> writes:
>
> > Sven Suska schrieb:
> >
> > If it were just a normal hash, we would have to lock the object manually:
> >
>
> Start locking from this point, before you do a check.
>
> > if users.has_key(new_nickname)
> > raise "Nickname already exists"
> > end
> > # hash could be modified here, if not locked!!!
> > # if another thread would have stored a user with the same nickname
> > # just now, then this data would be overwritten in the next
> > statement.
>
> If you start locking at this point, then in between the check and the
> insertion another check and insertion could have occurred and the
> following insertion effectively undoes that insertion.
>
> > users[new_nickname] = user_data
>
>
> > If "users" were in this "partially_frozen" state, then we could write:
> >
> > begin
> > users[new_nickname] = user_data
> > rescue CantChangeExistingDataError
> > raise "Nickname already exists"
> > end
>
> For Hash in official ruby implementation, since #= is implemented in C
> without any yielding, #= is effectively an atomic operation. But since
> you want to throw an exception if there is already existing entry, you'd
> still need a lock to do the check-and-insert atomically.

I see this point now, I totally forgot what OP wanted this for, easy
synchronization.
I doubt that this is possible, R/W locks would do the same and of course <<
would need a Write lock even if it "only" creates a new key.

FWIAC I thought the concept of demi-frozen object state avoiding any
information loss was interesting on a theoretical base, not for
synchronization.
>
<snip>
>
> YS.
>
>
R.

--

Robert Klemme

5/18/2007 8:36:00 AM

On 17.05.2007 11:55, Sven Suska wrote:
> in this post, I'd like to make a proposal to contribute
> to Ruby's fitness for the challenges of parallelization.
>
> The problem with sharing objects between threads
> is concurrent write access.
> If the objects were frozen,
> everything would be OK ... but also boring. :)
>
> Now, I think it should be possible
> to freeze the "already existing" data of an object
> and at the same time leave it open for "further growth".
> Especially, I have Hashes and Arrays in mind:
> New key/value pairs could be added to Hashes,
> objects could be appended to Arrays.

Do you envision this added state to be shared by threads or thread local?

> So, I think it would be beneficial to introduce a third state of objects
> between frozen and normal, like "growing" or "crystallizing" or
> "monotonous" or "incrementally_freezing" or "partially_frozen" ...
>
> In this state, all methods that would
> modify the already "crystallized" data of the object
> should throw an exception.
> E.g. for strings, only appending would be allowed.

I do not believe that it would be beneficial and here's why: on an
abstract level the state of the object still changes, so you need proper
synchronization mechanisms in place. Now, if you can *identify* the
subset of the state that does not change, you can put it into another
instance (probably of another class). Now you have a clean separation
of constant state and mutable state - even better: you do not have *any*
locking issues at all if you use the mutable instance only in one thread
and share the constant only. Implementation complexity (of the
interpreter) and efficiency wise this is much better than having a
single instance with split state where read write synchronization has to
take place. You will have to define the constant and mutable part for
any class individually anyway so you can as well build concrete
collaborating classes in Ruby that will handle this.

Concrete: if you want Arrays with common immutable prefixes, just create
a class SharedImmutablePrefixArray (possibly using Delegator or such)
that refers a shared constant Array and a local mutable Array and
implements methods like #[], #[]=, #each, #size etc. accordingly. You
could even create this class in a way that the mutable state is stored
via Thread#[] (i.e. thread local storage) so a single instance can
actually be shared by multiple threads.

Other advantages of this approach: it is *explicit*, i.e. in code you
create an instance of SharedImmutablePrefixArray per thread and it
becomes apparent what's going on.

Also, with a single instance that would combine mutable and immutable
state for multiple threads internally garbage collection will be much
harder, since you will not have an object reference for the per thread
state. This again will make the implementation of GC much more complex
and likely less efficient.

> Has a similar idea been discussed before?

I am not aware of something like this. But it's good to discuss
suggestions like this from time to time. :-)

Kind regards

robert

Robert Dober

5/18/2007 8:56:00 AM

On 5/18/07, Robert Klemme <shortcutter@googlemail.com> wrote:
<snip>
> > Has a similar idea been discussed before?
>
> I am not aware of something like this. But it's good to discuss
> suggestions like this from time to time. :-)

I feel that the idea is worth discussing out of the synchronization
issue. I do not think it is fit for that purpose.
However the idea of an object state that allows to add information but
forbids the loss/overwriting of information is just great.

Maybe an example is better

class History < Array # for shortness of code, I'd probably use deleagtion
def initialize
no_loss # --> that is the semi frozen state
end
end
h= History.new
h << "Just lost my SVN Server"
h << "My dog eat my flash disk, the small one (dog not disk)"
puts h ==> "2007-05-16 14:13:12 Just lost my SVN Server\n...."
h.pop ==> TypeError: cannot remove/modify information in a no_loss object <....>

I just feel that would be nice to have.

Cheers
Robert

>
> Kind regards
>
> robert
>
>

--
You see things; and you say Why?
But I dream things that never were; and I say Why not?
-- George Bernard Shaw

Ryan Davis

5/18/2007 3:22:00 PM

On May 18, 2007, at 01:55 , Robert Dober wrote:

> However the idea of an object state that allows to add information but
> forbids the loss/overwriting of information is just great.

and not the least bit new:

http://www.amazon.com/Purely-Functional-Structures-Chris-O...
0521663504

comp.lang.ruby

"Crystallizing" Objects

enduro

enduro

Robert Dober

enduro

Robert Dober

Ryan Davis

Yohanes Santoso

Robert Dober

Robert Klemme

Robert Dober

Ryan Davis

x Login to ForumsZone