[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: Bug in CGI::unescapeHTML?

800603

7/6/2007 6:56:00 AM

On 7 6 , 12 42 , Yukihiro Matsumoto <m...@ruby-lang.org> wrote:
> Hi,
>
> In message "Re: Bug in CGI::unescapeHTML?"
> on Thu, 5 Jul 2007 13:00:02 +0900, Esad Hajdarevic <esad.ta...@esse.at> writes:
>
> |I think there's a bug in CGI::unescapeHTML. Or am I doing something wrong?
> |
> |$KCODE='u'
> |CGI::unescapeHTML("&#xE3;")
> |
> |will return "\343", which according to my screaming mysql utf-8 encoded
> |database is not a valid utf-8 sequence
>
> Not a bug, unfortunately. Since your client sent a binary sequence
> "\343" in URL encoding,unescapeHTML() decoded it back. Specifying
> $KCODE='u' does not affect encoding your clients send. You have to
> check (or convert) input from your clients explicitly, anyway.
>
> matz.


I meet with this misfortune too, and have to modify function
CGI::unescapeHTML, any one kind would give an appropriate resolution ?

def CGI::unescapeHTML(string)
string.gsub(/&(amp|quot|gt|lt|\#[0-9]+|\#x[0-9A-Fa-f]+);/n) do
match = $1.dup
case match
when 'amp' then '&'
when 'quot' then '"'
when 'gt' then '>'
when 'lt' then '<'
when /\A#0*(\d+)\z/n then
if Integer($1) < 256 and Integer($1) > 127 and ($KCODE[0] == ?
u or $KCODE[0] == ?U)
[Integer($1)].pack("U")
elsif Integer($1) < 256
Integer($1).chr
else
if Integer($1) < 65536 and ($KCODE[0] == ?u or $KCODE[0] == ?
U)
[Integer($1)].pack("U")
else
"&##{$1};"
end
end
when /\A#x([0-9a-f]+)\z/ni then
if $1.hex < 256 and $1.hex > 127 and ($KCODE[0] == ?u or
$KCODE[0] == ?U)
[$1.hex].pack("U")
elsif $1.hex < 256
$1.hex.chr
else
if $1.hex < 65536 and ($KCODE[0] == ?u or $KCODE[0] == ?U)
[$1.hex].pack("U")
else
"&#x#{$1};"
end
end
else
"&#{match};"
end
end
end