Asp Forum - Symbols garbage collector in Ruby1.9, fixed?

Iñaki Baz Castillo

3/30/2009 8:09:00 AM

Hi, in Ruby 1.8 there is an issue when adding more and more Symbols
since they remain in memory and are never removed.

I'm doing a server in Ruby that receives messages with headers (From,
To, Subject, X-Custom-Header-1...) and after parsing I store the
headers in a hash using symbols as keys:

headers =3D {
:from =3D> "alice@aaa.com",
:to =3D> "bob@bbb.com",
:"x-custom-header-1" =3D> "Hi there"
}

I could use strings as keys instead of symbols, but I've checked that
getting a Hash entry is ~25% faster using Symbols.

The problem is that I could receive custom headers so for each one a
new Symbol would be created. An attacker could send lots of custom
headers to fill the server memory and cause a denial of service.

Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

23 Answers

Iñaki Baz Castillo

3/30/2009 8:17:00 AM

2009/3/30 I=C3=B1aki Baz Castillo <ibc@aliax.net>:
> Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

Is there any way to check if a Symbol already exist before creating it?

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

F. Senault

3/30/2009 9:02:00 AM

Le 30 mars 2009 à 10:09, Iñaki Baz Castillo a écrit :

> The problem is that I could receive custom headers so for each one a
> new Symbol would be created. An attacker could send lots of custom
> headers to fill the server memory and cause a denial of service.
>
> Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

It depends on what exactly you are trying to do with your hash. If you
need to access to a few well known headers in your code, use symbols for
those and add another pseudo-header for the rest of the info :

USEFUL_HEADERS = [ :from, :to, :"x-mailer" ]

headers = {
:from => "alice@aaa.com",
:to => "bob@bbb.com",
:"x-mailer" => "Pegasus Mail for Windows (4.50 PB1)",
:"_custom" => {
"x-custom-header-1" => "Hi there",
"x-spam-scanned" => "Of course"
}
}

(Now, you'll lose time at the parse step. Again, depending on what
you're trying to do, it may be efficient if each mail is parsed one time
and, then, each header is accessed a lot of times.)

Fred
--
Well I'm the one without a soul I'm the one with this big fucking hole
No new tale to tell Twenty-six years on my way to hell Gotta listen to
your big time hard line bad luck fist fuck Don't think you're having
all the fun You know me I hate everyone (Nine Inch Nails, Wish)

Iñaki Baz Castillo

3/30/2009 9:18:00 AM

2009/3/30 F. Senault <fred@lacave.net>:
>
> It depends on what exactly you are trying to do with your hash. =C2=A0If =
you
> need to access to a few well known headers in your code, use symbols for
> those and add another pseudo-header for the rest of the info :
>
> USEFUL_HEADERS =3D [ :from, :to, :"x-mailer" ]
>
> headers =3D {
> =C2=A0:from =3D> "alice@aaa.com",
> =C2=A0:to =3D> "bob@bbb.com",
> =C2=A0:"x-mailer" =3D> "Pegasus Mail for Windows (4.50 PB1)",
> =C2=A0:"_custom" =3D> {
> =C2=A0 =C2=A0"x-custom-header-1" =3D> "Hi there",
> =C2=A0 =C2=A0"x-spam-scanned" =3D> "Of course"
> =C2=A0}
> }
>
> (Now, you'll lose time at the parse step. =C2=A0Again, depending on what
> you're trying to do, it may be efficient if each mail is parsed one time
> and, then, each header is accessed a lot of times.)

Thanks, but I prefer to store all the headers in a transparent way so
accessing to a core and well known header is the same as accesing to a
custom and never seen header:
headers[:from]
header[:"x-custom-headers"]

This is, in the transport/parsing layer I cannot know which headers
will be important or not in the "application" layer.

A way to check if a Symbol already exist would be enought for me, but
it doesn't work:
To know all the current Symbols I can inspect Symbol.all_symbols, but
if I want to check a Symbol:
Symbol.all_symbols.include?(:new_symbol)
this will always return true since :new_symbol is automatically added XDDD

Thanks.

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Bill Kelly

3/30/2009 9:32:00 AM

From: "IÃ±aki Baz Castillo" <ibc@aliax.net>
>
> A way to check if a Symbol already exist would be enought for me, but
> it doesn't work:
> To know all the current Symbols I can inspect Symbol.all_symbols, but
> if I want to check a Symbol:
> Symbol.all_symbols.include?(:new_symbol)
> this will always return true since :new_symbol is automatically added XDDD

potential_new_symbol = "xyzzy"
Symbol.all_symbols.map {|s| s.to_s}.include? potential_new_symbol

?

Regards,

Bil

F. Senault

3/30/2009 9:35:00 AM

Le 30 mars 2009 à 11:17, Iñaki Baz Castillo a écrit :

> A way to check if a Symbol already exist would be enought for me, but
> it doesn't work:
> To know all the current Symbols I can inspect Symbol.all_symbols, but
> if I want to check a Symbol:
> Symbol.all_symbols.include?(:new_symbol)

Symbol.all_symbols.find { |s| s.to_s == "string" }

But, now, you're creating strings instead... :)

Fred
--
Oh your velocity How can it really be Part of the symmetry If every
moment connects the next And every moment affects you Not what it's
meant to be Part of the scenery And all your satelights Are fragmented
I feel a little crushed And out of control (Collide, Crushed)

Iñaki Baz Castillo

3/30/2009 9:38:00 AM

2009/3/30 Bill Kelly <billk@cts.com>:
>
> From: "I=C3=B1aki Baz Castillo" <ibc@aliax.net>
>>
>> A way to check if a Symbol already exist would be enought for me, but
>> it doesn't work:
>> To know all the current Symbols I can inspect Symbol.all_symbols, but
>> if I want to check a Symbol:
>> =C2=A0Symbol.all_symbols.include?(:new_symbol)
>> this will always return true since :new_symbol is automatically added
>> =C2=A0XDDD
>
> potential_new_symbol =3D "xyzzy"
> Symbol.all_symbols.map {|s| s.to_s}.include? potential_new_symbol

Thanks but it is too slow:

Benchmark.realtime{ Symbol.all_symbols.map {|s| s.to_s}.include? "qwe" }
=3D> 0.00371980667114258

I cannot do this test for each header in each received message.

Thanks.

> ?
>
>
> Regards,
>
> Bil
>
>
>
>

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Bill Kelly

3/30/2009 9:55:00 AM

From: "IÃ±aki Baz Castillo" <ibc@aliax.net>
> 2009/3/30 Bill Kelly <billk@cts.com>:
> >
> > From: "IÃ±aki Baz Castillo" <ibc@aliax.net>
> >>
> >> A way to check if a Symbol already exist would be enought for me, but
> >> it doesn't work:
> >> To know all the current Symbols I can inspect Symbol.all_symbols, but
> >> if I want to check a Symbol:
> >> Symbol.all_symbols.include?(:new_symbol)
> >> this will always return true since :new_symbol is automatically added
> >> XDDD
> >
> > potential_new_symbol = "xyzzy"
> > Symbol.all_symbols.map {|s| s.to_s}.include? potential_new_symbol
>
> Thanks but it is too slow:
>
> Benchmark.realtime{ Symbol.all_symbols.map {|s| s.to_s}.include? "qwe" }
> => 0.00371980667114258
>
> I cannot do this test for each header in each received message.

I assumed you had a plan for that. :)

We could cache them as a hash, for rapid lookup:

@known_symbols = Hash[ *Symbol.all_symbols.map {|s| [s.to_s,true]}.flatten ]

# Later....

@known_symbols.include? "xyzzy"

Regards,

Bill

Iñaki Baz Castillo

3/30/2009 10:02:00 AM

2009/3/30 Bill Kelly <billk@cts.com>:
>> Thanks but it is too slow:
>>
>> Benchmark.realtime{ Symbol.all_symbols.map {|s| s.to_s}.include? "qwe" }
>> =3D> 0.00371980667114258
>>
>> I cannot do this test for each header in each received message.
>
> I assumed you had a plan for that. =C2=A0:)
>
> We could cache them as a hash, for rapid lookup:
>
> =C2=A0@known_symbols =3D Hash[ *Symbol.all_symbols.map {|s| [s.to_s,true]=
}.flatten
> ]
>
> # Later....
>
> =C2=A0@known_symbols.include? "xyzzy"

That sounds interesting, I'll try it.

Thanks :)

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Rick DeNatale

3/30/2009 11:56:00 AM

On Mon, Mar 30, 2009 at 4:09 AM, I=F1aki Baz Castillo <ibc@aliax.net> wrote=
:

> Hi, in Ruby 1.8 there is an issue when adding more and more Symbols
> since they remain in memory and are never removed.
>
> I'm doing a server in Ruby that receives messages with headers (From,
> To, Subject, X-Custom-Header-1...) and after parsing I store the
> headers in a hash using symbols as keys:
>
> headers =3D {
> :from =3D> "alice@aaa.com",
> :to =3D> "bob@bbb.com",
> :"x-custom-header-1" =3D> "Hi there"
> }
>
> I could use strings as keys instead of symbols, but I've checked that
> getting a Hash entry is ~25% faster using Symbols.
>
> The problem is that I could receive custom headers so for each one a
> new Symbol would be created. An attacker could send lots of custom
> headers to fill the server memory and cause a denial of service.
>

Which is why Rails (actually activesupport) which implements a
HashWithIndifferentAccess to allows using strings and symbols equivalently
for hash access, uses the string form in the actual hash forgoing the acces=
s
performance in favor of safety.

--=20
Rick DeNatale

Blog: http://talklikeaduck.denh...
Twitter: http://twitter.com/Ri...
WWR: http://www.workingwithrails.com/person/9021-ric...
LinkedIn: http://www.linkedin.com/in/ri...

Brian Candler

3/30/2009 12:16:00 PM

IÃ±aki Baz Castillo wrote:
> I could use strings as keys instead of symbols, but I've checked that
> getting a Hash entry is ~25% faster using Symbols.
>
> The problem is that I could receive custom headers so for each one a
> new Symbol would be created. An attacker could send lots of custom
> headers to fill the server memory and cause a denial of service.
>
> Perhaps this is solved in Ruby 1.9? any suggestion on it? Thanks a lot.

It's not "solved" in 1.9, because this is intentional and necessary
behaviour.

The important property of a symbol is that it has the same id wherever
and whenever it is used in your program, and hence it can never be
garbage-collected. This is so that it can be used for looking up method
names - foo.bar is a shortcut for foo.send(:bar)

Using symbols for hash keys is a common idiom, but arguably is abuse of
the symbol table. It's fine as long as all the keys are fixed symbol
constants in your program, but as you've observed, it causes huge
problems if your symbols are generated dynamically in response to user
data (especially from untrusted or potentially malicious sources)

The solution: use strings as keys, and beware premature optimisation.
Whilst you may have measured that "getting a Hash entry is 25% faster
using Symbols", does this really make your whole application 25% faster?
I suspect not. Maybe it makes your whole application 0.25% faster. Maybe
it makes your application slower, as each incoming String has to be
converted into a Symbol.

In any case, although we all want things to go "as fast as possible",
few applications have a specific acceptance criteria for CPU utilisation
or response time. If your application *does* have a specific performance
criterion that you must meet, then it might be better to consider a
different language, rather than mis-using what Ruby offers. Or including
all things like development costs, it may be more cost-effective to
choose faster hardware to meet the performance goal.

Regards,

Brian.
--
Posted via http://www.ruby-....

comp.lang.ruby

Symbols garbage collector in Ruby1.9, fixed?

Iñaki Baz Castillo

Iñaki Baz Castillo

F. Senault

Iñaki Baz Castillo

Bill Kelly

F. Senault

Iñaki Baz Castillo

Bill Kelly

Iñaki Baz Castillo

Rick DeNatale

Brian Candler

x Login to ForumsZone