Robert Klemme
6/25/2006 5:28:00 PM
dkmd_nielsen@hotmail.com wrote:
> I'm attempting to parse parameter controls that are in the following
> format:
>
> keyword = char(lg) {| | block}
>
> where (lg) and {| | block} are optional parameters. Doesn't seem
> difficult. However, I'm having problems capturing the last block.
> Some setups catch the block; and some do not. The ones that do not are
> the ones that would indicate it is optional.
>
> The following are my test runs. The first three capture the block
> within ellipses. Note: I'm using a derivative of the show_regexp
> function provided in Dave Thomas' book. The 1/2/3/4 are $1, $2, $3,
> and $4. The original expression identified four blocks of code.
>
> =+= The following work fine =+=
> ======================= Must be there
> 'priority = h(5) {}', /(\{.*\}[\s,]*)/
> priority = h(5) <<{}>>
> 1:({}) 2:() 3:() 4:()
> =======================
> ======================= Must be at end of string
> 'priority = h(5) {}', /(\{.*\}[\s,]*)$/
> priority = h(5) <<{}>>
> 1:({}) 2:() 3:() 4:()
> =======================
> ======================= There must be one
> 'priority = h(5) {}', /(\{.*\}[\s,]*){1}/
> priority = h(5) <<{}>>
> 1:({}) 2:() 3:() 4:()
> =======================
>
>
> These last three fail. These are the ones that say (or I think they
> say) "the block may or may not be there." But they fail to identify
> the existing block.
>
>
> =+= The following fail. =+=
> ======================= Can be zero or more
> 'priority = h(5) {}', /(\{.*\}[\s,]*)*/
> <<>>priority = h(5) {}
> 1:() 2:() 3:() 4:()
> =======================
> ======================= At least zero, but not more than one
> 'priority = h(5) {}', /(\{.*\}[\s,]*){0,1}/
> <<>>priority = h(5) {}
> 1:() 2:() 3:() 4:()
> =======================
> ======================= Zero or one occurrence (one I would like to
> use)
> 'priority = h(5) {}', /(\{.*\}[\s,]*)?/
> <<>>priority = h(5) {}
> 1:() 2:() 3:() 4:()
> =======================
>
>
> The entire expression that I thought would work is this:
>
> Regexp.new('(\w+) *= *([A-Za-z]{1,2})(\(\d{1,2}\))*(\{.+\}[\s,]*)*,*')
First hint, don't use the string constructor in this case - you make
your life harder than necessary. Rather use a literal regexp.
Here's what I'd do:
irb(main):001:0> s='priority = h(5) {}'
=> "priority = h(5) {}"
irb(main):008:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match(s).to_a
=> ["priority = h(5) {}", "priority", "h", "(5)", "{}"]
irb(main):009:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match("a=b").to_a
=> ["a=b", "a", "b", nil, nil]
irb(main):010:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match("a=b(c)").to_a
=> ["a=b(c)", "a", "b", "(c)", nil]
irb(main):011:0>
%r{(\w+)\s*=\s*(\w+)\s*(?:(\([^)]+\))\s*)?(\{[^\}]*\})?}.match("a=b
{}").to_a
=> ["a=b {}", "a", "b", nil, "{}"]
You can as well use the form with whitespace and comments to make it
more clear:
%r<
(\w+) # first token
\s*
=
\s*
(\w+) # second token
\s*
(?:(\([^)]+\))\s*)? # optional parens with trailing WS
(\{[^}]*\})? # optional block
>x
Kind regards
robert