Asp Forum - Improving hexadecimal escaping performance

Iñaki Baz Castillo

2/23/2009 12:06:00 AM

Hi, I've a module with two methods (thanks Jeff):
=2D hex_unescape(string)
=2D hex_scape(string)
as follows:

def self::hex_unescape(str)
str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too=
=20
much its approach using "sprintf". Is there other way more ellegant?=20
(performance is the mos important requeriment anyway).

Thanks a lot.

=2D-=20
I=C3=B1aki Baz Castillo

10 Answers

7stud --

2/23/2009 6:23:00 AM

IÃ±aki Baz Castillo wrote:
> I don't like
> too
> much its approach using "sprintf". Is there other way more ellegant?
> (performance is the mos important requeriment anyway).
>

pickaxe2, p. 23:
------
Another output method we use a lot is printf....
------

pickaxe2, p. 526:
--------
printf

Equivalent to io.write sprintf(...)
--------

The Ruby Way (2nd), p. 72:
----------
2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.
---------

>Is there other way more ellegant?

def hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) do |match|
"%%%02X" % match[0]
end
end

s = "?<>Ã©"
puts hex_escape(s)

--output:--
%3F%3C%3E%C3%A9

--
Posted via http://www.ruby-....

Robert Klemme

2/23/2009 9:46:00 AM

2009/2/23 I=F1aki Baz Castillo <ibc@aliax.net>:
> Hi, I've a module with two methods (thanks Jeff):
> - hex_unescape(string)
> - hex_scape(string)
> as follows:
>
> def self::hex_unescape(str)
> str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
> end
>
> def self::hex_escape(str)
> str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
> end
>
> "hex_escape" method is copied from CGI lib, and sincerelly I don't like t=
oo
> much its approach using "sprintf". Is there other way more ellegant?
> (performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end

Iñaki Baz Castillo

2/23/2009 10:28:00 AM

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> 2009/2/23 I=C3=B1aki Baz Castillo <ibc@aliax.net>:
>> Hi, I've a module with two methods (thanks Jeff):
>> - hex_unescape(string)
>> - hex_scape(string)
>> as follows:
>>
>> def self::hex_unescape(str)
>> str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
>> end
>>
>> def self::hex_escape(str)
>> str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) =
}
>> end
>>
>> "hex_escape" method is copied from CGI lib, and sincerelly I don't like =
too
>> much its approach using "sprintf". Is there other way more ellegant?
>> (performance is the mos important requeriment anyway).
>
> Then I am sure you _measured_ it and came to the conclusion that it is
> too slow, did you? What are your results and what are your
> performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Thanks.

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Robert Klemme

2/23/2009 11:54:00 AM

2009/2/23 I=F1aki Baz Castillo <ibc@aliax.net>:
> 2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
>> 2009/2/23 I=F1aki Baz Castillo <ibc@aliax.net>:
>>> Hi, I've a module with two methods (thanks Jeff):
>>> - hex_unescape(string)
>>> - hex_scape(string)
>>> as follows:
>>>
>>> def self::hex_unescape(str)
>>> str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
>>> end
>>>
>>> def self::hex_escape(str)
>>> str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0])=
}
>>> end
>>>
>>> "hex_escape" method is copied from CGI lib, and sincerelly I don't like=
too
>>> much its approach using "sprintf". Is there other way more ellegant?
>>> (performance is the mos important requeriment anyway).
>>
>> Then I am sure you _measured_ it and came to the conclusion that it is
>> too slow, did you? What are your results and what are your
>> performance requirements?
>
> I did a Benchmark.realtime comparing hex_unescape and hex_escape
> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
> ~4*10^(-5).
>
> Anyway I've realized right now that "sprintf" is directly implemented
> as C code so it can't be faster.

Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end

Iñaki Baz Castillo

2/23/2009 1:55:00 PM

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> Well, you can at least do this in 1.8
>
> def self::hex_escape(str)
> str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
> end
>
> And this in 1.9
>
> def self::hex_escape(str)
> str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
> end

Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "=C3=B1", "=E2=82=AC"...) while in 1.8 it retur=
ns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

Thanks a lot.

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Robert Klemme

2/23/2009 2:16:00 PM

2009/2/23 I=F1aki Baz Castillo <ibc@aliax.net>
>
> 2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> > Well, you can at least do this in 1.8
> >
> > def self::hex_escape(str)
> > str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
> > end
> >
> > And this in 1.9
> >
> > def self::hex_escape(str)
> > str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
> > end
>
>
> Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
> than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
> it's more than two bytes as "=F1", "=80"...) while in 1.8 it returns just
> the firrst two bytes?
>
> PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
> "getbyte()" method for String.

15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

robert

--
remember.guy do |as, often| as.you_can - without end

Iñaki Baz Castillo

2/23/2009 2:33:00 PM

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> 15:15:25 ~$ ruby -ve 'p "foo"[0]'
> ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
> 102
> 15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
> "f"
> 15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
> 102
> 15:15:57 ~$

Clear now, thanks :)
--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Simon Krahnke

2/23/2009 3:20:00 PM

* Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:

> I did a Benchmark.realtime comparing hex_unescape and hex_escape
> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
> ~4*10^(-5).

For what exactly is 40 microseconds too slow?

mfg, simon .... l

Iñaki Baz Castillo

2/23/2009 4:14:00 PM

2009/2/23 Simon Krahnke <overlord@gmx.li>:
> * I=C3=B1aki Baz Castillo <ibc@aliax.net> (11:28) schrieb:
>
>> I did a Benchmark.realtime comparing hex_unescape and hex_escape
>> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
>> ~4*10^(-5).
>
> For what exactly is 40 microseconds too slow?

I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?

--=20
I=C3=B1aki Baz Castillo
<ibc@aliax.net>

Simon Krahnke

2/24/2009 2:30:00 AM

* Iñaki Baz Castillo <ibc@aliax.net> (17:14) schrieb:

> 2009/2/23 Simon Krahnke <overlord@gmx.li>:
>> * Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:
>>
>>> I did a Benchmark.realtime comparing hex_unescape and hex_escape
>>> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
>>> ~4*10^(-5).
>>
>> For what exactly is 40 microseconds too slow?
>
> I don't mean that, but it's extrange that the inverse method takes
> double time, isn't it?

How would you implement these at the core level?

mfg, simon .... l

comp.lang.ruby

Improving hexadecimal escaping performance

Iñaki Baz Castillo

7stud --

Robert Klemme

Iñaki Baz Castillo

Robert Klemme

Iñaki Baz Castillo

Robert Klemme

Iñaki Baz Castillo

Simon Krahnke

Iñaki Baz Castillo

Simon Krahnke

x Login to ForumsZone