[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

String doesnt auto dup on modification

Nit Khair

1/21/2009 7:16:00 PM

I'm writing my first largeish app. One issue that gets me frequently is
this:

I define a string in one class. Some other class references it, and
modifies it. I (somehow) expected that when another referer modifies the
reference, ruby would automatically dup() the string.

Anyway, through trial and error, I start dup()'ing strings myself. I am
aware of freeze().

But would like to know how others handle this generally in large apps.

- Do you keep freezing Strings you make in your classes to avoid
accidental change

- Do you habitually dup() your string ?

Is there some clean way of handling this that I am missing.
--
Posted via http://www.ruby-....

34 Answers

Nit Khair

1/21/2009 7:19:00 PM

0

> - Do you habitually dup() your string ?
>
> Is there some clean way of handling this that I am missing.
To continue:

In some critical places in my app I had done:

def set_value str
@buffer = str.dup
end

def get_value
@buffer.dup
end

Is this the norm? Do you do this generally, to avoid accidental changes
?
--
Posted via http://www.ruby-....

Stefan Lang

1/21/2009 9:10:00 PM

0

2009/1/21 RK Sentinel <sentinel.2001@gmx.com>:
> I'm writing my first largeish app. One issue that gets me frequently is
> this:
>
> I define a string in one class. Some other class references it, and
> modifies it. I (somehow) expected that when another referer modifies the
> reference, ruby would automatically dup() the string.
>
> Anyway, through trial and error, I start dup()'ing strings myself. I am
> aware of freeze().
>
> But would like to know how others handle this generally in large apps.
>
> - Do you keep freezing Strings you make in your classes to avoid
> accidental change
>
> - Do you habitually dup() your string ?
>
> Is there some clean way of handling this that I am missing.

This is a well known "problem" with all languages that
have mutable strings. The solution is simple:

* Use destructive string methods only after profiling has shown
that string manipulation is the bottleneck.

* Don't mutate a string after passing it across encapsulation
boundaries.

Freezing certain strings can be beneficial in the same way
assertions are, habitually duping strings is a bad practice, IMO.

Stefan

Brian Candler

1/21/2009 9:58:00 PM

0

RK Sentinel wrote:
> Anyway, through trial and error, I start dup()'ing strings myself. I am
> aware of freeze().
>
> But would like to know how others handle this generally in large apps.
>
> - Do you keep freezing Strings you make in your classes to avoid
> accidental change
>
> - Do you habitually dup() your string ?

Generally, no.

Of course there is no contract to enforce this, but in many cases it
would be considered bad manners to modify an object which is passed in
as an argument.

If you only read the object, then it doesn't matter. If you need a
modified version, create a new object. Usually this doesn't require
'dup'.

def foo(a_string)
a_string << "/foo" # bad
a_string = "#{a_string}/foo" # good
a_string = a_string + "/foo" # good
end

DEFAULT_OPT = {:foo => "bar"}

def bar(opt = {})
opt[:foo] ||= "bar" # bad
opt = DEFAULT_OPT.merge(opt) # good
end

If you are paranoid, you can freeze DEFAULT_OPT and all its keys and
values.

Sometimes you will see frozen strings as an optimisation to reduce the
amount of garbage objects created:

...
foo["bar"] # creates a new "bar" string every time round

BAR = "bar".freeze
...
foo[BAR] # always uses the same object

This probably won't make any noticeable difference except in the most
innermost of loops.
--
Posted via http://www.ruby-....

Tom Cloyd

1/21/2009 10:09:00 PM

0

Stefan Lang wrote:
> 2009/1/21 RK Sentinel <sentinel.2001@gmx.com>:
>
>> I'm writing my first largeish app. One issue that gets me frequently is
>> this:
>>
>> I define a string in one class. Some other class references it, and
>> modifies it. I (somehow) expected that when another referer modifies the
>> reference, ruby would automatically dup() the string.
>>
>> Anyway, through trial and error, I start dup()'ing strings myself. I am
>> aware of freeze().
>>
>> But would like to know how others handle this generally in large apps.
>>
>> - Do you keep freezing Strings you make in your classes to avoid
>> accidental change
>>
>> - Do you habitually dup() your string ?
>>
>> Is there some clean way of handling this that I am missing.
>>
>
> This is a well known "problem" with all languages that
> have mutable strings. The solution is simple:
>
> * Use destructive string methods only after profiling has shown
> that string manipulation is the bottleneck.
>
> * Don't mutate a string after passing it across encapsulation
> boundaries.
>
> Freezing certain strings can be beneficial in the same way
> assertions are, habitually duping strings is a bad practice, IMO.
>
> Stefan
>
>
>
If this is an utterly dumb question, just ignore it. However, I AM
perplexed by this response. Here's why:

I thought it was OK for an object to receive input, and output a
modified version of same. If they don't get to do that, their use seems
rather limited. In my current app, I create a log object, and various
classes write to it. I don't create new objects every time I want to add
a log entry. Why would I do that? Makes no sense to me. I might want to
do exactly the same thing to a string. You seem to be saying this is bad
form. I can see that there are cases where you want the string NOT to be
modified, but you see to be saying that to modify the original string at
all is bad.

It makes perfect sense to me to pass an object (string, in this case)
across an encapsulation boundary specifically to modify it.

What am I missing here?

Thanks, if you can help me out!

Tom

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< tc@tomcloyd.com >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Nit Khair

1/22/2009 5:06:00 AM

0

Tom Cloyd wrote:
> Stefan Lang wrote:
>>> aware of freeze().
>>
>> assertions are, habitually duping strings is a bad practice, IMO.
>>
>> Stefan
>>
>>
>>
> If this is an utterly dumb question, just ignore it. However, I AM
> perplexed by this response. Here's why:
i agree with you.

I have objects that get a string, process/clean it for printing/display.
(That is the whole purpose of centralizing data and behaviour into
classes.)

Remembering I must not modify it is a big mental overhead and results
in strange things that I spend a lot of time tracking, till I find out
--- oh no the string got mutated over there. Now I must start dup()'ing
it -- okay, where all should I dup it ?

To the previous poster - yes, one does not have to use dup. One can
create a new string by changing the method from say gsub! to just
gsub() and take the return value. I include such situations when i say
dup().
--
Posted via http://www.ruby-....

Robert Klemme

1/22/2009 7:29:00 AM

0

On 21.01.2009 22:57, Brian Candler wrote:
> RK Sentinel wrote:
>> Anyway, through trial and error, I start dup()'ing strings myself. I am
>> aware of freeze().
>>
>> But would like to know how others handle this generally in large apps.
>>
>> - Do you keep freezing Strings you make in your classes to avoid
>> accidental change
>>
>> - Do you habitually dup() your string ?
>
> Generally, no.

Same here.

> Of course there is no contract to enforce this, but in many cases it
> would be considered bad manners to modify an object which is passed in
> as an argument.

Depends: for example, if you have a method that is supposed to dump
something to a stream (IO and friends) which only uses << you can as
well use String there.

> If you only read the object, then it doesn't matter.

That may be true for methods but if you need to store a String as
instance variable then I tend to dup it if the application is larger.
You can even automate conditional dup'ing by doing something like this

class Object
def dupf
frozen? ? self : dup
end
end

and then

class MyClass
def initialize(name)
@name = name.dupf
end
end

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end

Brian Candler

1/22/2009 8:16:00 AM

0

Tom Cloyd wrote:
> I thought it was OK for an object to receive input, and output a
> modified version of same.

Do you mean "return the same object reference, after the object has been
modified", or "return a new object, which is a modified copy of the
original"?

> If they don't get to do that, their use seems
> rather limited. In my current app, I create a log object, and various
> classes write to it. I don't create new objects every time I want to add
> a log entry. Why would I do that? Makes no sense to me.

I'd consider a logger object as a sort of stream. You're just telling
the logger to "do" something every time you send it a message; you're
not really telling it to change into a different sort of logger. (Of
course, if the logger is logging to an underlying string buffer, then
changing that buffer is a desired side effect of logging, but the logger
itself is still the same)

> I might want to
> do exactly the same thing to a string. You seem to be saying this is bad
> form. I can see that there are cases where you want the string NOT to be
> modified, but you see to be saying that to modify the original string at
> all is bad.

No, I'm not saying this. Sometimes it's useful to modify the string
passed in:

def cleanup!(str)
str.strip!
str.replace("default") if str.empty?
end

However I'd say this is not the usual case. More likely I'd write

def cleanup(str)
str = str.strip
str.empty? "default" : str
end

> It makes perfect sense to me to pass an object (string, in this case)
> across an encapsulation boundary specifically to modify it.

Yes, in some cases it does, and it's up to you to agree the 'contract'
in your documentation that that's what you'll do. I'm not saying it's
forbidden.

But this seems to be contrary to your original question, where you were
saying you were defensively dup'ing strings, on both input and output,
to avoid cases where they get mutated later by the caller or some other
object invoked by the caller.

I'm saying to avoid this problem, the caller would not pass a string to
object X, and *then* mutate it (e.g. by passing it to another object Y
which mutates it). And in practice, I find this is not normally a
problem, because normally objects do not mutate strings which are passed
into them as arguments.

This is not a hard and fast rule. It's just how things work out for me
in practice. It depends on your coding style, and whether you're coding
for yourself or coding libraries to be used by other people too.

Regards,

Brian.
--
Posted via http://www.ruby-....

Nit Khair

1/22/2009 9:27:00 AM

0

instance variable then I tend to dup it if the application is larger.
> You can even automate conditional dup'ing by doing something like this
>
> class Object
> def dupf
> frozen? ? self : dup
> end
> end
>
> and then
>
> class MyClass
> def initialize(name)
> @name = name.dupf
> end
> end
>
> Kind regards
>
> robert

thanks, this looks very helpful.

In response to the prev post by Brian, yes its a library for use by
others.
--
Posted via http://www.ruby-....

Stefan Rusterholz

1/22/2009 9:39:00 AM

0

Robert Klemme wrote:
> class MyClass
> def initialize(name)
> @name = name.dupf
> end
> end

I'd vote against this. It looks to me like a great way to confuse users
and complicate interfaces. I'd rather go towards transparency.
Generally I'd not mutate arguments, only the receiver. If there's a
valid case to mutate an argument, it should be documented and evident.
The user then has to provide a duplicate if he still needs the original.

Just my 0.02â?¬

Regards
Stefan Rusterholz
--
Posted via http://www.ruby-....

Brian Candler

1/22/2009 9:50:00 AM

0

RK Sentinel wrote:
>> class Object
>> def dupf
>> frozen? ? self : dup
>> end
>> end
>>
>> and then
>>
>> class MyClass
>> def initialize(name)
>> @name = name.dupf
>> end
>> end
>>
>> Kind regards
>>
>> robert
>
> thanks, this looks very helpful.

Beware: this may not solve your problem. It will work if the passed-in
object is itself a String, but not if it's some other object which
contains Strings.

Try:

a = ["hello", "world"]
b = a.dupf
b[0] << "XXX"
p a

However, deep-copy is an even less frequently seen solution.

> In response to the prev post by Brian, yes its a library for use by
> others.

It's hard to provide concrete guidance without knowing what this library
does, but it sounds like it builds some data structure which includes
the string passed in.

I believe that a typical library is behaving correctly if it just stores
a reference to the string.

If a problem arises because the caller is later mutating that string
object, this could be considered to be a bug in the *caller*. The caller
can fix this problem by dup'ing the string at the point when they pass
it in, or dup'ing it later before changing it.

Again, this is not a hard-and-fast rule. Sometimes defensive dup'ing is
reasonable. For example, Ruby's Hash object has a special-case for
string keys: if you pass in an unfrozen string as a hash key, then the
string is dup'd and frozen before being used as the key.

This may (a) lead to surprising behaviour, and (b) doesn't solve the
general problem (*). However strings are very commonly used as hash
keys, and they are usually short, so it seems a reasonable thing to do.

Regards,

Brian.

(*) In the case of a hash with string keys, if you mutated one of those
keys then it could end up being on the wrong hash chain, and become
impossible to retrieve it from the hash using hash["key"]

So it's a question of which of two behaviours is least undesirable:
objects disappearing from the hash because you forgot to call #rehash,
or hashes working "right" with string keys but "wrong" with other keys.

irb(main):001:0> k = [1]
=> [1]
irb(main):002:0> h = {k => 1, [2] => 2}
=> {[1]=>1, [2]=>2}
irb(main):003:0> h[[1]]
=> 1
irb(main):004:0> k << 2
=> [1, 2]
irb(main):005:0> h
=> {[1, 2]=>1, [2]=>2}
irb(main):006:0> h[[1,2]]
=> nil <<< WHOOPS!
irb(main):007:0> h.rehash
=> {[1, 2]=>1, [2]=>2}
irb(main):008:0> h[[1,2]]
=> 1
--
Posted via http://www.ruby-....