[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Symbols and frozen strings

Brian Candler

9/6/2007 7:50:00 AM

I just had a thought.

One of the problems with using strings as hash keys is that every time you
refer to them, you create a throw-away garbage string:

params["id"]
^
+-- temporary string, needs to be garbage collected

In Rails you have HashWithIndifferentAccess, but this actually isn't any
better. Although you write params[:id], when executed the symbol is
converted to a string anyway.

In a Rails-like scenario, using symbols as the real keys within the hash
doesn't work: the keys come from externally parsed data, which means (a)
they were strings originally, and (b) if you converted them to symbols you'd
risk a symbol exhaustion attack.

So I thought, wouldn't it be nice to have a half-way house: being able to
converting a symbol to a string, in such a way that you always got the same
(frozen) string object?

This turned out to be extremely easy:

class Symbol
def fring
@fring ||= to_s.freeze
end
end

irb(main):006:0> :foo.fring
=> "foo"
irb(main):007:0> :foo.fring.object_id
=> -605512686
irb(main):008:0> :foo.fring.object_id
=> -605512686
irb(main):009:0> :bar.fring
=> "bar"
irb(main):010:0> :bar.fring.object_id
=> -605543036
irb(main):011:0> :bar.fring.object_id
=> -605543036
irb(main):012:0> :bar.fring << "x"
TypeError: can't modify frozen string
from (irb):12:in `<<'
from (irb):12
from :0

Is this a well-known approach, and/or it does it exist in any extension
library?

I suppose that an instance variable lookup isn't necessarily faster than
always creating a temporary string with to_s and then garbage collecting it
at some point later in time, but it feels like it ought to be :-)

However, since I've seen discussion about string modifiers like "..."u,
perhaps there's scope for adding in-language support, e.g.

"..."f - frozen string, same object ID each time it's executed

In that case, it might be more convenient the other way round:

"..." - frozen string literal, same object
"..."m - mutable (unfrozen) string literal, new objects
String.new("...") - another way of making a mutable string
"...".dup - and another

That would break a lot of existing code, but it could be pragma-enabled.

Sorry if this ground has been covered before - it's hard to keep up with
ruby-talk :-)

Regards,

Brian.

12 Answers

Nobuyoshi Nakada

9/6/2007 8:06:00 AM

0

Hi,

At Thu, 6 Sep 2007 16:50:28 +0900,
Brian Candler wrote in [ruby-talk:267857]:
> So I thought, wouldn't it be nice to have a half-way house: being able to
> converting a symbol to a string, in such a way that you always got the same
> (frozen) string object?

Rather, Symbol#to_s should return frozen String?

> I suppose that an instance variable lookup isn't necessarily faster than
> always creating a temporary string with to_s and then garbage collecting it
> at some point later in time, but it feels like it ought to be :-)
>
> However, since I've seen discussion about string modifiers like "..."u,
> perhaps there's scope for adding in-language support, e.g.
>
> "..."f - frozen string, same object ID each time it's executed

What about "..."o like Regexp?

--
Nobu Nakada

Brian Candler

9/6/2007 9:11:00 AM

0

> Rather, Symbol#to_s should return frozen String?

Yes, as long as it returns the same frozen string each time.

Hmm, this sounds like a good solution - it's technically not
backwards-compatible, but I doubt that much code does a Symbol#to_s and
later mutates it.

> What about "..."o like Regexp?

Sure, I don't mind about the actual syntax.

Of course, you don't even need to add 'o' to a Regexp in the case where it
doesn't contain any #{...} interpolation:

irb(main):001:0> RUBY_VERSION
=> "1.8.4"
irb(main):002:0> 3.times { puts /foo/.object_id }
-605554606
-605554606
-605554606

Regards,

Brian.

Trans

9/6/2007 12:27:00 PM

0



On Sep 6, 5:10 am, Brian Candler <B.Cand...@pobox.com> wrote:
> > Rather, Symbol#to_s should return frozen String?
>
> Yes, as long as it returns the same frozen string each time.
>
> Hmm, this sounds like a good solution - it's technically not
> backwards-compatible, but I doubt that much code does a Symbol#to_s and
> later mutates it.

I've tried that. There are some places where it blows up Ruby. So
those would have to be rooted out first.

T.


Robert Klemme

9/6/2007 12:51:00 PM

0

2007/9/6, Trans <transfire@gmail.com>:
>
>
> On Sep 6, 5:10 am, Brian Candler <B.Cand...@pobox.com> wrote:
> > > Rather, Symbol#to_s should return frozen String?
> >
> > Yes, as long as it returns the same frozen string each time.
> >
> > Hmm, this sounds like a good solution - it's technically not
> > backwards-compatible, but I doubt that much code does a Symbol#to_s and
> > later mutates it.
>
> I've tried that. There are some places where it blows up Ruby. So
> those would have to be rooted out first.

I always prefer less intrusive solutions. Why not do this:

SYMS = Hash.new {|h,sy| h[sy]=sy.to_s}

Then, wherever you need this, just do "SYMS[a_sym]" instead
"a_sym.to_s". Added advantage, you can throw away or clear SYMS when
you know you do not need it any more thusly freeing up memory.

Kind regards

robert

Joel VanderWerf

9/6/2007 3:05:00 PM

0

Brian Candler wrote:
> I just had a thought.
>
> One of the problems with using strings as hash keys is that every time you
> refer to them, you create a throw-away garbage string:
>
> params["id"]
> ^
> +-- temporary string, needs to be garbage collected

Setting aside the question of freezing, why can't ruby share string data
for all strings generated from the same symbol? And in that case you
could do the following to avoid garbage:

params[:id.to_s]

(Or ruby could even look up the literal "id" in the symbol table and do
this for you.)

This code shows some of the cases in which ruby does and does not share
string contents:


def show_vmsize
GC.start
puts `ps -o vsz #$$`[/\d+/]
end

s = "a"*1000
sym = s.to_sym

show_vmsize # 8712

# ruby apparently does not share storage for strings derived
# from the same symbol:

strs1 = (0..10_000).map do
sym.to_s
end

show_vmsize # 18488

# ruby does share storage for string ops:

strs2 = (0..10_000).map do
s[0..-1]
end

show_vmsize # 18616

strs3 = (0..10_000).map do
s.dup
end

show_vmsize # 18616

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Joel VanderWerf

9/6/2007 3:19:00 PM

0

Joel VanderWerf wrote:
> Brian Candler wrote:
>> I just had a thought.
>>
>> One of the problems with using strings as hash keys is that every time
>> you
>> refer to them, you create a throw-away garbage string:
>>
>> params["id"]
>> ^
>> +-- temporary string, needs to be garbage collected
>
> Setting aside the question of freezing, why can't ruby share string data
> for all strings generated from the same symbol? And in that case you
> could do the following to avoid garbage:
>
> params[:id.to_s]

Sorry... _reduce_ garbage, not avoid it altogether, since there is still
the T_STRING, even though the data is reused. It would help more for
long strings than for short strings, because the T_DATA is smaller in
proportion.

The idea of a literal for a unique frozen string would reduce garbage
further, sharing the T_STRING as well as the data.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

khaines

9/6/2007 3:27:00 PM

0

Joel VanderWerf

9/6/2007 3:30:00 PM

0

Joel VanderWerf wrote:
> Sorry... _reduce_ garbage, not avoid it altogether, since there is still
> the T_STRING, even though the data is reused. It would help more for
> long strings than for short strings, because the T_DATA is smaller in
> proportion.

Sorry again... I don't know where T_DATA came from. Should be T_STRING,
the constant-size overhead for a string object. Will stop posting until
caffeine hits.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Brian Candler

9/6/2007 6:44:00 PM

0

> Setting aside the question of freezing, why can't ruby share string data
> for all strings generated from the same symbol?

Because it could generate unexpected aliasing. The normal, expected
behaviour is no aliasing:

irb(main):001:0> a = :foo.to_s
=> "foo"
irb(main):002:0> b = :foo.to_s
=> "foo"
irb(main):003:0> b << "bar"
=> "foobar"
irb(main):004:0> a
=> "foo"

That's why the string has to be frozen.

Regards,

Brian.

Joel VanderWerf

9/6/2007 7:34:00 PM

0

Brian Candler wrote:
>> Setting aside the question of freezing, why can't ruby share string data
>> for all strings generated from the same symbol?
>
> Because it could generate unexpected aliasing. The normal, expected
> behaviour is no aliasing:
>
> irb(main):001:0> a = :foo.to_s
> => "foo"
> irb(main):002:0> b = :foo.to_s
> => "foo"
> irb(main):003:0> b << "bar"
> => "foobar"
> irb(main):004:0> a
> => "foo"

This was what I was thinking of:

irb(main):001:0> a = :foo.to_s
=> "foo"
irb(main):002:0> b = a.dup
=> "foo"
irb(main):003:0> b << "bar"
=> "foobar"
irb(main):004:0> a
=> "foo"

Internally, a and b use the same storage, but copy-on-write prevents
aliasing.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407