Randy Kramer
3/19/2005 2:02:00 PM
Thanks to all who replied so far. I also want to look into the StringScanner
approach (I'll reply separately with some questions about that), and I can't
believe I couldn't find the ways to delete the first character of a string.
Guess I am a newbie!
On Saturday 19 March 2005 08:04 am, Robert Klemme wrote:
> I'd bet that this approach is slower than a pure regexp based approach.
So far, you're very right--my approach took about 30 times as long as the pure
regexp approach, although my Ruby code might not be very efficient. (In case
nobody noticed, I'm very much a newbie to Ruby.)
> If
> you cannot stuck all exact regexps into one (see below) then maybe some
> form of stripped regexps might help. For example:
This sounds like its worth a try, but:
1) I haven't created all the necessary REs yet
2) Question below (for clarification)
> rx1 = /ab+/
> rx2 = /cd+/
>
> rx_all = /(ab+)|(cd+)/
>
> rx_stripped = /[ab](\w+)/
Question: IIUC, the [ab] above should be [ac]?
> # then, use these on the second part
> rx_stripped_1 = /^b+/
> rx_stripped_2 = /^d+/
>
> This is just a simple example for demonstration. For these simple regexps
> rx_all is the most efficient one I'm sure.
> What does "fairly large" mean? I would try to start with stucking *all*
> these regexps into one - if the rx engine does not choke on that regexp I'd
> assume that this is the most efficient way to do it, as then you have the
> best ratio of machine code to ruby interpretation. Maybe you just show us
> all these regexps so we can better understand the problem.
It's hard even to guess, I intended to combine several REs into one anyway
when they had a lot of commonality. For example, the TWiki markup for
headings (which I'm planning to use) is like this:
---* Level 1
---** Level 2
---*** Level 3
---**** Level 4
---***** Level 5
---****** Level 6
I've planned to use one RE for all the above, then determine the level from
the length of the match (like level = len - 3).
Likewise, "inline" markup is *for bold*, _for italic,_ __for bold italic__,
and so forth. I'd try to have one RE looking for words preceded by _, *, or
__, and another with words ending with the same. (And might combine words
marked with % for %TWikiVariables% as well.
With "optimizations" like this, I'd guess on the order of 15 or so regexps.
> Now I'm getting really curios. Care to post some more details?
I presume you mean on the 1 to 10% savings? I planned to do that, I'll try to
put something on WikiLearn this weekend then post something here.
Randy Kramer