William James
12/14/2005 11:09:00 AM
ako... wrote:
> hello,
>
> i need to write a function that would parse a string literal in another
> language. a string literal in this language is:
>
> STRING = "CHAR*"
> CHAR = any character except for " and > | \"
> | \> | \/
> | \u four hexadecimal digits
>
> the \u sequence specifies a character in UTF-16 encoding.
>
> for example: "abc", "", "a\"bc", "a\\b", "a\u12bfc"
>
> below is the code that i wrote. is this Ruby enough? can someone
> suggest improvements? a better style?
>
> thanks
> konstantin
>
> def parselit(s)
> r = %r{\\"|\\/|\\\\|\\u[\da-f][\da-f][\da-f][\da-f]}i
> s =~ /^"((?:[^"\\]|#{r})*)"$/ && $1.gsub(r) { |x| x =~ /\\u(.*)/ ?
> [$1.hex].pack('U*') : x[1..-1] }
> end
>
> puts parselit('"\u004e\"a"')
def parselit(s)
re = %r{
\\"
| \\/
| \\\ | \\u [\da-f] {4}
}xoi
return nil if s !~ /^".*"$/
out = ""
s[1..-2].scan( /\G (?: ( [^"\\]+ ) | ( #{re} ) )/x ){ |x|
out <<
if !x.last
x.first
else
if x.last[0,2] == '\u'
[x.last[2..-1].hex].pack('U*')
else
x.last[1..-1]
end
end
}
# Fail if whole string didn't match.
if $~.post_match != ""
nil
else
out
end
end
puts parselit('"\u004e\"a"')
puts parselit('"\u004e\""a"')