Robert Klemme
8/3/2007 12:47:00 PM
2007/8/3, Bob Hutchison <hutch@recursive.ca>:
> Hi,
>
> Does anyone know of a fast implementation of the XML escape method
> (the one that converts '"<>& to " etc.)?
>
> I did some benchmarking on one of my applications and the
> implementation I have, which I thought was okay -- simple minded for
> sure, but okay -- turns out to be a bottle neck in certain operations.
>
> I used ruby-prof with a simple test, running over a 400 character
> string 50,000 times or so. Running the profiler on version0 (below)
> took 1.39 seconds.
>
> def version0(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> return result
> end
>
> The original simple minded way was, under ruby-prof ran in 8.74 seconds:
>
> def version1(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> result.gsub!("&", "&")
> result.gsub!("<", "<")
> result.gsub!(">", ">")
> result.gsub!("'", "'")
> result.gsub!("\"", """)
>
> return result
> end
>
> The best I've come up with so far is, under ruby-prof ran in 3.33:
>
> def version2(input)
> # all kinds of other processing of input simulated by the input.dup
> result = input.dup
>
> result.gsub!(/[&<>'"]/) do | match |
> case match
> when '&' then return '&'
> when '<' then return '<'
> when '>' then return '>'
> when "'" then return '''
> when '"' then return '"e;'
> end
> end
>
> return result
> end
>
> After accounting for overhead, 3.8 times faster is good, I'd like it
> faster still. BTW, gsub is only marginally slower that gsub! I've
> tried using simple iteration, gsub with a hash to avoid the case, and
> variations, all slower to a lot slower than version 1, nothing really
> near version2 (which really was the first variation I tried).
>
> Any ideas?
You are on the right track. There is just one thing to improve: get
rid of "case":
class Converter
MAP = {
"&" => "&",
# ...
}
def self.convert(s)
s.gsub(/[&<>'"]/) do |m|
MAP[m] || "ERROR"
end
end
end
Also, I believe x.dup.gsub! is less efficient than doing just a single x.gsub.
Kind regards
robert