Jano Svitok
9/4/2006 12:52:00 PM
On 9/4/06, singsang <tomsingsang@yahoo.com> wrote:
> Dear all,
>
> Writing some httpd logfile pre-processing (splitting it up, getting
> already some basic numbers), I think that I should compile the Regexp
> for the logfile entry only once.
>
> So my guess is that I should have perhaps a class LogFormat that holds
> this as a class variable or a class constant. Below I use a non-tested
> regular expression that is not complete yet.
>
> So the idea is to have:
>
> class LogFormat
> @@RegEx = Regexp.new( '(\S+) (\S+) (\S+) \[(\d+)/(\w+)/(\d+)
> [+\-]\d+?\]' )
> def LogFormat.regex
> @@RegEx
> end
> end
>
> If now from a class LogLine (instanciated for each line in the logfile)
> I use something like
>
> class LogLine
> # ...
> ip, rfc931, user, day, month, year, offset =
> line.match(LogFormat.regex)
> # ...
> end
>
> My question: How often is the Regexp compiled? When?
> When the definition of LogFormat is read first?
>
> Btw: If anybody has a ready-to-use regex for the common log format this
> would be great, but I will get that done as far as I need by myself.
> ;-)
> Other question: Does anybody know a "Webalizer" sort of thing written
> in Ruby?
Hi,
<non-authoritative answer follows ;-) >
- =~ seems to be (by an order) faster than String#match
- maybe you can make a constant REGEX from @@RegEx, eliminating the
need for #regex
- it seems that when the definition is constant (i.e. no #{xxx}), only
one object is created.