Caleb Clausen
3/8/2006 8:45:00 PM
OK, so first off, your sample implementation seemed to have several
bugs in it. After fixing those, I thought you might be able to save
some time by glomming all the regexp's together, obviating the need
for StringScanner altogether. However, that doesn't seem to have
actually made any difference... if anything it seems to have been a
little slower. I don't know why. And the great big long Regexp is
considerably harder to read.
I tried to optimize some of your patterns to eliminate backtracking,
use noncapturing parens, etc. That also didn't seem to help much. So,
it looks like (sans bugs) your code is pretty much optimal. I'm
including my version below in case it might be useful anyway.
Some notes:
I got rid of silly stuff like \D+? after the interface name, since it
doesn't seem necessary. (It's not needed for the one line of data you
provided, anyway.)
The ip addresses will now both end with ".". So, chop it off if that's
a problem.
Looks like you inverted the source and destination address/port
fields? I didn't fix that...
require 'strscan'
a = "1140908573.050732 rule 19/0(match): pass unkn(255) on sis1:
80.202.226.15.50000 > 192.168.0.6.52525: UDP, length 64"
10000.times do
s = StringScanner.new(a)
time = s.scan(/\d+\.\d+/)
s.pos = 23
rule_no = s.scan(/\d+/)
s.skip(/\S+\s/)
stat = s.scan(/\w+/)
s.skip(/\s\S+\son\s/)
interface = s.scan(/\w+\:/)
s.skip(/\s/)
out_ip = s.scan(/(?:\d+\.){4}/)
out_port = s.scan(/\d+/)
s.skip(/ > /)
in_ip = s.scan(/(?:\d+\.){4}/)
in_port = s.scan(/\d+/)
s.pos += 2
proto = s.scan(/\w+/)
end