Asp Forum - Regexp match question on interpolated strings...

Richard Kilmer

10/5/2004 3:27:00 AM

If I had the source for a string:

"name = #{person.first_name+" "+person.last_name} ... Ok?"

And assuming I could find the first and last double quotes, how would I
parse out the #{ ... } with a regular expression since anything can fall
between the #{ ... } braces in a string?

Thanks in advance.

-rich

12 Answers

Joe Cheng

10/5/2004 4:49:00 AM

Richard Kilmer wrote:
> If I had the source for a string:
>
> "name = #{person.first_name+" "+person.last_name} ... Ok?"
>
> And assuming I could find the first and last double quotes, how would I
> parse out the #{ ... } with a regular expression since anything can fall
> between the #{ ... } braces in a string?

Hmmm, if I understand your question, and if you really knew where the
first and last double quotes were, you could calculate the number of
chars between them, and do something like this:

/#\{.*?\".{<number_of_chars>}\".*?}/

But it seems like if you want to be able to get more dynamic/flexible
than that, you really want to parse the expression for real--which is
something I believe regexes aren't powerful enough for. You'd either
have to write a parser by hand, or use something like racc:

http://i.loveruby.net/en/prog...

Brian Schröder

10/5/2004 9:09:00 AM

Richard Kilmer wrote:
> If I had the source for a string:
>
> "name = #{person.first_name+" "+person.last_name} ... Ok?"
>
> And assuming I could find the first and last double quotes, how would I
> parse out the #{ ... } with a regular expression since anything can fall
> between the #{ ... } braces in a string?
>
> Thanks in advance.
>
> -rich
>
>
>
Regular expressions are not able to "count" more than a finite number of
states, and the number of states is fixed at compile time. That is
because regular expressions map to finite automata. So it is impossible
to match opening and closing braces in an unknown expression. For this
to work always you need a model that can enter unbounded many states.

But beware, your computer is also only a finite state machine with a lot
of states. The number of its states is bounded by the size of ram (and
harddisk).

If you are shure that there will be no closing braces inside of the
braces you could match
/\#\{(.*?)\}/ =~ string

or including at most one pair of inside braces

/\#\{([^\{}]*(\{.*?\}|).*?)\}/ =~ string

As you see it begins to get ugly now.

Regards,

Brian
--
Brian Schröder
http://ruby.brian-sch...

James Gray

10/5/2004 2:04:00 PM

On Oct 5, 2004, at 4:09 AM, Brian Schröder wrote:

> Regular expressions are not able to "count" more than a finite number
> of states, and the number of states is fixed at compile time. That is
> because regular expressions map to finite automata. So it is
> impossible to match opening and closing braces in an unknown
> expression. For this to work always you need a model that can enter
> unbounded many states.

Just for the sake of clarity, you are speaking of Ruby's regular
expressions here. Perl's regex engine has no such limitation. Using
the (?? ... ) construct, Perl regular expressions can parse balanced
delimiters. I miss this feature and would love to see Ruby add
something similar in the future.

James Edward Gray II

James Gray

10/5/2004 2:09:00 PM

On Oct 4, 2004, at 10:27 PM, Richard Kilmer wrote:

> If I had the source for a string:
>
> "name = #{person.first_name+" "+person.last_name} ... Ok?"
>
> And assuming I could find the first and last double quotes, how would I
> parse out the #{ ... } with a regular expression since anything can
> fall
> between the #{ ... } braces in a string?

I would use:

sub(/^(.+?)\#\(.+\}/m, '\1')

Hope that helps.

James Edward Gray II

ts

10/5/2004 2:25:00 PM

>>>>> "J" == James Edward Gray <james@grayproductions.net> writes:

J> expressions here. Perl's regex engine has no such limitation. Using
J> the (?? ... ) construct, Perl regular expressions can parse balanced
^^

I've always find strange the choice for these 2 charcaters ...

J> delimiters. I miss this feature and would love to see Ruby add
J> something similar in the future.

This ?

svg% cat b.rb
#!ruby -rjj
["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
p $& if /(?<aa>$(?:(?>[^()]+)|\g<aa>)*$)/ =~ m
end
/(?<aa>$(?:(?>[^()]+)|\g<aa>)*$)/.dump
svg%

svg% ruby b.rb
"(aaa(bbbc)xxx)"
"(aaa(bb(b)c)xxx)"
Regexp /(?<aa>$(?:(?>[^()]+)|\g<aa>)*$)/
0 call 2
1 jump 19
2 mem-start-push 1
3 exact1 (
4 push-if-peek-next ) ===> -1
5 null-check-start 0
6 push 13
7 cclass-not (-) (2)
8 push 12
9 cclass-not (-) (2)
10 pop
11 jump 8
12 jump 14
13 call 2
14 null-check-end-memst-push 0
15 jump 4
16 exact1 )
17 mem-end-rec 1
18 return
19 end
Optimize EXACT : (
svg%

Guy Decoux

Brian Schröder

10/5/2004 2:45:00 PM

James Edward Gray II wrote:
>
> On Oct 4, 2004, at 10:27 PM, Richard Kilmer wrote:
>
>> If I had the source for a string:
>>
>> "name = #{person.first_name+" "+person.last_name} ... Ok?"
>>
>> And assuming I could find the first and last double quotes, how would I
>> parse out the #{ ... } with a regular expression since anything can fall
>> between the #{ ... } braces in a string?
>
>
> I would use:
>
> .sub(/^(.+?)\#\(.+\}/m, '\1')

This would be:
sub(/^(.+?)\#\{.+\}/m, '\1')
^
Why are you preferring the greedy match? And if I get it right this
substitutes
"name = #{person.first_name+" "+person.last_name} ... Ok?"
to
"name = ... Ok?"

I don't think that is what is asked? Or am I wrong?

regards,

Brian

>
> Hope that helps.
>
> James Edward Gray II
>
>

--
Brian Schröder
http://ruby.brian-sch...

Markus

10/5/2004 3:15:00 PM

On Tue, 2004-10-05 at 07:04, James Edward Gray II wrote:
> On Oct 5, 2004, at 4:09 AM, Brian SchrÃ¶der wrote:
>
> > Regular expressions are not able to "count" more than a finite number
> > of states, and the number of states is fixed at compile time. That is
> > because regular expressions map to finite automata. So it is
> > impossible to match opening and closing braces in an unknown
> > expression. For this to work always you need a model that can enter
> > unbounded many states.
>
> Just for the sake of clarity, you are speaking of Ruby's regular
> expressions here. Perl's regex engine has no such limitation. Using
> the (?? ... ) construct, Perl regular expressions can parse balanced
> delimiters. I miss this feature and would love to see Ruby add
> something similar in the future.

I think Brian's point is true of regular expressions in general,
not any particular implementation. If the perl idiom you mention can in
fact do general purpose matching of unbounded depth, it doesn't mean
that "regular expressions" can do this, but rather that Larry has
implemented a more powerful parser and (incorrectly) called it "regular
expressions."

If this isn't clear, consider an analogy: if I write a language and
include a trailing-dot-digit idiom, such that 1.6 can be used as an
integer, does it mean that '1.6' is an now integer or that I've
implemented some form of reals numbers and mislabeled them 'integers'?

-- Markus

James Gray

10/5/2004 3:28:00 PM

On Oct 5, 2004, at 9:44 AM, Brian Schröder wrote:

> This would be:
> .sub(/^(.+?)\#\{.+\}/m, '\1')
> ^
> Why are you preferring the greedy match?

If it's know there are no braces in the string save the #{ ... }, I
think it's much preferable. {}s are certainly allowed in Ruby code.

> And if I get it right this substitutes
> "name = #{person.first_name+" "+person.last_name} ... Ok?"
> to
> "name = ... Ok?"
>
> I don't think that is what is asked? Or am I wrong?

Hmm, rereading the original message, I believe you are right. My
apologies.

James Edward Gray II

James Gray

10/5/2004 3:38:00 PM

On Oct 5, 2004, at 9:25 AM, ts wrote:

>>>>>> "J" == James Edward Gray <james@grayproductions.net> writes:
> J> delimiters. I miss this feature and would love to see Ruby add
> J> something similar in the future.
>
> This ?
>
> svg% cat b.rb
> #!ruby -rjj
> ["(aaa(bbbc)xxx)", "(aaa(bb(b)c)xxx)"].each do |m|
> p $& if /(?<aa>$(?:(?>[^()]+)|\g<aa>)*$)/ =~ m
> end
> /(?<aa>$(?:(?>[^()]+)|\g<aa>)*$)/.dump

Wow. I can't decipher how, but that sure appears to work, though not
in my Ruby. ;) What is this magical "jj" library you loaded?

James Edward Gray II

ts

10/5/2004 3:42:00 PM

>>>>> "J" == James Edward Gray <james@grayproductions.net> writes:

J> Wow. I can't decipher how, but that sure appears to work, though not
J> in my Ruby. ;)

it's Oniguruma (the re engine for 1.9)

J> What is this magical "jj" library you loaded?

jj, is like ii, it want only work at moulon :-)

Guy decoux

comp.lang.ruby

Regexp match question on interpolated strings...

Richard Kilmer

Joe Cheng

Brian Schröder

James Gray

James Gray

ts

Brian Schröder

Markus

James Gray

James Gray

ts

x Login to ForumsZone