Asp Forum - [Q] specify start postion of Regexp matching

Makoto Kuwata

11/25/2007 3:18:00 PM

Hi, all.

Is it possible to specify start position of Regexp matching?

str = "foo bar baz"
m = /ba/.match(str)
p m.begin(0) #=> 4
m = /ba/.match(str, 5) # is it possible?
p m.begin(0) #=> 8 (if possible)

If it is possible, some kind of parser or scanner can be
implemented easily.
# StringScanner is a litte too big, I think.

--
makoto kuwata

6 Answers

Eric I.

11/25/2007 3:40:00 PM

On Nov 25, 10:18 am, makoto kuwata <k...@kuwata-lab.com> wrote:
> Hi, all.
>
> Is it possible to specify start position of Regexp matching?
>
> str = "foo bar baz"
> m = /ba/.match(str)
> p m.begin(0) #=> 4
> m = /ba/.match(str, 5) # is it possible?
> p m.begin(0) #=> 8 (if possible)
>
> If it is possible, some kind of parser or scanner can be
> implemented easily.
> # StringScanner is a litte too big, I think.

You could try something like this:

m = /^.{5,}(ba)/.match(str)
p m.begin(1)

In the regular expression, you're saying start at the beginning and
skip at least 5 characters. But then we have to use parens to "note"
the part you're interested in, and then we have to pass 1 rather than
0 to begin, so it reports the location of the first noted match (0
would report where the entire Regexp matched, and that would be the
beginning of the line).

An alternative would be to slice the first n characters off the front
of the string and then do the match.

Eric

====

Interested in hands-on, on-site Ruby training? See http://Lea...
for information about a well-reviewed class.

Robert Klemme

11/25/2007 4:24:00 PM

On 25.11.2007 16:39, Eric I. wrote:
> On Nov 25, 10:18 am, makoto kuwata <k...@kuwata-lab.com> wrote:
>> Hi, all.
>>
>> Is it possible to specify start position of Regexp matching?
>>
>> str = "foo bar baz"
>> m = /ba/.match(str)
>> p m.begin(0) #=> 4
>> m = /ba/.match(str, 5) # is it possible?
>> p m.begin(0) #=> 8 (if possible)
>>
>> If it is possible, some kind of parser or scanner can be
>> implemented easily.
>> # StringScanner is a litte too big, I think.
>
> You could try something like this:
>
> m = /^.{5,}(ba)/.match(str)
> p m.begin(1)
>
> In the regular expression, you're saying start at the beginning and
> skip at least 5 characters. But then we have to use parens to "note"
> the part you're interested in, and then we have to pass 1 rather than
> 0 to begin, so it reports the location of the first noted match (0
> would report where the entire Regexp matched, and that would be the
> beginning of the line).
>
> An alternative would be to slice the first n characters off the front
> of the string and then do the match.

Another alternative is to use String#scan - we would have to know what
the OP really wants to parse though to decide whether it's a feasible
solution.

Kind regards

robert

Axel Etzold

11/25/2007 4:47:00 PM

-------- Original-Nachricht --------
> Datum: Mon, 26 Nov 2007 00:20:25 +0900
> Von: makoto kuwata <kwa@kuwata-lab.com>
> An: ruby-talk@ruby-lang.org
> Betreff: [Q] specify start postion of Regexp matching

> Hi, all.
>
> Is it possible to specify start position of Regexp matching?
>
> str = "foo bar baz"
> m = /ba/.match(str)
> p m.begin(0) #=> 4
> m = /ba/.match(str, 5) # is it possible?
> p m.begin(0) #=> 8 (if possible)
>
> If it is possible, some kind of parser or scanner can be
> implemented easily.
> # StringScanner is a litte too big, I think.
>
> --
> makoto kuwata
>

Dear Makoto,

what about :

class Regexp
def match_index_offset(string,start_pos)
temp=string[start_pos..-1]
ref=self.match(temp)
return temp.index(ref[0])+start_pos
end
end

str = "foo bar baz"
m = /ba/.match_index_offset(str,5)
p m

Best regards,

Axel

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/mult...

Makoto Kuwata

11/25/2007 5:59:00 PM

Thank you, all.

Eric l wrote:
> You could try something like this:
> m = /^.{5,}(ba)/.match(str)
> p m.begin(1)

In my program, start position is variable such as
def f(n)
m = /^.{n,}(ba)/.match(str)
...
end
In this case, /^.{n,}(ba)/ is created for each time.
It is not effective.

Robert Klemme wrote:
> Another alternative is to use String#scan -

String#scan is useful only when regexp pattern is fixed.
input.scan(/FIXED-REGEXP/) do ... end
Using String#scan, it is not able to change regexp pattern
in the loop.

Axel Etzold wrote:
> temp=string[start_pos..-1]
> ref=self.match(temp)
> return temp.index(ref[0])+start_pos

In this solution, temp substring is created every time.
If input string is long, it is not efficient.

Thanks to all your advices.
I'm going to propose to support start position in Regexp#match().

--
makoto kuwata

Robert Klemme

11/25/2007 7:16:00 PM

On 25.11.2007 18:58, makoto kuwata wrote:
> Robert Klemme wrote:
>> Another alternative is to use String#scan -
>
> String#scan is useful only when regexp pattern is fixed.
> input.scan(/FIXED-REGEXP/) do ... end
> Using String#scan, it is not able to change regexp pattern
> in the loop.

But in various situations it is possible to use a unified regexp for
scanning or a regexp that comprises all other patterns.

> Axel Etzold wrote:
>> temp=string[start_pos..-1]
>> ref=self.match(temp)
>> return temp.index(ref[0])+start_pos
>
> In this solution, temp substring is created every time.
> If input string is long, it is not efficient.

This is not true. Creating a substring is fairly cheap because the
character buffer is not copied (copy on write).

> I'm going to propose to support start position in Regexp#match().

For the time being it's faster to use one of the other alternatives.
Also, with the new regexp engine in 1.9 your feature might be present
already.

Kind regards

robert

Makoto Kuwata

11/25/2007 11:25:00 PM

Robert Klemme <shortcut...@googlemail.com> wrote:
> > In this solution, temp substring is created every time.
> > If input string is long, it is not efficient.
>
> This is not true. Creating a substring is fairly cheap because the
> character buffer is not copied (copy on write).

You are right. If input string is not modified, creating substring
doesn't copy anything.
Creating substring may be the solution I wanted.

> > I'm going to propose to support start position in Regexp#match().
>
> For the time being it's faster to use one of the other alternatives.
> Also, with the new regexp engine in 1.9 your feature might be present
> already.

I found that Regexp#match() can take optional 2nd argument which
specifies matching start position in Ruby1.9. Good news.

Thank you, Robert.

--
makoto kuwata

comp.lang.ruby

[Q] specify start postion of Regexp matching

Makoto Kuwata

Eric I.

Robert Klemme

Axel Etzold

Makoto Kuwata

Robert Klemme

Makoto Kuwata

x Login to ForumsZone