Justin Collins
7/24/2006 7:24:00 PM
S3 wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I am trying to write a simple script to parse
> an Apache log file, but it is taking an extremely
> long time. I used profile and the problem appears
> to be with the regular expression matcher.
>
> I have made a simple script to run it with different
> lengths and it appears that the regular expression
> matcher is being very slow. See the attached script.
>
> Here are some timings:
> ~>time ./mklong.rb 1000
> real 0m11.930s
> ~>time ./mklong.rb 2000
> real 0m50.400s
> ~>time ./mklong.rb 3000
> real 1m55.693s
> ~>time ./mklong.rb 4000
> real 3m16.004s
>
> So, dividing it out,
> 1000/11 = 91
> 2000/50 = 40
> 3000/120 = 25
> 4000/200 = 20
>
>
> So, the matching appears to be much slower than O(n).
> Isn't the whole point of regular expressions to be
> fast and O(n)?
> Whenever my script encounters a long string,
> it grinds to a halt.
>
> Why is this?
> Did I make the regular expression correctly?
> Is there some way to optimize it?
> Is there a problem with the matcher?
>
> Any help would be appreciated.
> <snip code>
>
I'm not a regex expert, so I can't help you with optimizing yours, but
if your pattern causes the regex engine to backtrack repeatedly, then it
will not match in linear time. This usually becomes apparent on long
strings where there can be lots of backtracking.
Sorry I can't offer more help than that.
-Justin