Asp Forum - Regexp Question: Checking for [joe][/joe] pairs

Joe Peck

12/20/2006 8:31:00 PM

Hey, I've got some text in @x and want there to be at least 1 and at
most 3 [joe][/joe] pairs, each having at least one character between the
beginning [joe] and the ending [/joe].

This is what I have now, and it seems to sometimes work, and sometimes
not.

@x.match(/(\[joe\][\s\d\w]+?\[\/joe\]){1,3}/)

--
Posted via http://www.ruby-....

19 Answers

Daniel Finnie

12/20/2006 8:36:00 PM

Why are you doing /[\s\d\w]+?/? Just use /.+?/.

Dan

Joe Peck wrote:
> Hey, I've got some text in @x and want there to be at least 1 and at
> most 3 [joe][/joe] pairs, each having at least one character between the
> beginning [joe] and the ending [/joe].
>
> This is what I have now, and it seems to sometimes work, and sometimes
> not.
>
> @x.match(/(\[joe\][\s\d\w]+?\[\/joe\]){1,3}/)
>

dblack

12/20/2006 8:40:00 PM

Joe Peck

12/20/2006 8:41:00 PM

Daniel Finnie wrote:
> Why are you doing /[\s\d\w]+?/? Just use /.+?/.
>
> Dan

Good point. I was using .+? earlier, but thought that might be part of
my problem. It seems to accept @x even if it contains more than 3
[joe][/joe] pairs.

--
Posted via http://www.ruby-....

dblack

12/20/2006 8:42:00 PM

dblack

12/20/2006 8:47:00 PM

William James

12/20/2006 8:50:00 PM

Joe Peck wrote:
> Hey, I've got some text in @x and want there to be at least 1 and at
> most 3 [joe][/joe] pairs, each having at least one character between the
> beginning [joe] and the ending [/joe].
>
> This is what I have now, and it seems to sometimes work, and sometimes
> not.
>
> @x.match(/(\[joe\][\s\d\w]+?\[\/joe\]){1,3}/)
>
> --
> Posted via http://www.ruby-....

@x = "[joe] [/joe] [joe][/joe] [joe] foo [/joe]"
count = @x.scan(/\[joe\](.*?)\[\/joe\]/m).flatten.
reject{|s| ""==s}.size
p count

Daniel Finnie

12/20/2006 8:50:00 PM

The problem is that the Regexp is not anchored to the start and ends of
the string.

/^(?!\[joe\])*.?(\[joe\].+?\[\\joe\]){1,3}(?!\[joe\])*.?$/

Joe Peck wrote:
> Daniel Finnie wrote:
>> Why are you doing /[\s\d\w]+?/? Just use /.+?/.
>>
>> Dan
>
> Good point. I was using .+? earlier, but thought that might be part of
> my problem. It seems to accept @x even if it contains more than 3
> [joe][/joe] pairs.
>

Joe Peck

12/21/2006 3:52:00 PM

The problem is I don't want it to accept things like:
"[joe] hello [joe] how are [/joe] you"
where there are two opening tags before a closing tag is reached.
Similarly, I don't want to accept something like:
"hey [joe] it's hot today[/joe] where [joe] is the ac"
where there is one correct pair but then an opening tag without a
closing one.

--
Posted via http://www.ruby-....

Arne Brasseur

12/21/2006 4:40:00 PM

Joe Peck wrote:
> The problem is I don't want it to accept things like:
> "[joe] hello [joe] how are [/joe] you"
> where there are two opening tags before a closing tag is reached.
> Similarly, I don't want to accept something like:
> "hey [joe] it's hot today[/joe] where [joe] is the ac"
> where there is one correct pair but then an opening tag without a
> closing one.
>
I missed the beginning of this thread, but if I recall correctly from my
course on formal languages, this sort if thing can't be done with
regular expressions.

Regular expressions can be used to test whether a string belongs to a
certain regular language, which is a subset of all possible languages
(where a language is a set of strings). Regular expressions are
equivalent to finite state automata in this respect. Since a finite
state automata can only be in a finite number of states. You'd like to
match a possibly infinitely large number of [joe][/joe] pairs. The FSA
would need a new state for every extra [joe] it reads to remember it
still needs to consume a matching [/joe] for it.

If this sounds like Chinese, just remember regexpes aren't keen on
matching this sort of stuff. Stacks on the other hand seem to be custom
designed for these purposes.

A.

Joe Peck

12/21/2006 4:48:00 PM

> Regular expressions can be used to test whether a string belongs to a
> certain regular language, which is a subset of all possible languages
> (where a language is a set of strings). Regular expressions are
> equivalent to finite state automata in this respect. Since a finite
> state automata can only be in a finite number of states. You'd like to
> match a possibly infinitely large number of [joe][/joe] pairs. The FSA
> would need a new state for every extra [joe] it reads to remember it
> still needs to consume a matching [/joe] for it.
>
> If this sounds like Chinese, just remember regexpes aren't keen on
> matching this sort of stuff. Stacks on the other hand seem to be custom
> designed for these purposes.
>
> A.
It doesn't sound like Chinese :)

If wouldn't have to be an infinite amount of states. Let's say these
are the states:

State 1 - no [joe] yet. If finds [joe], goes to state 2. If finds
[/joe], fails.
State 2 - [joe] found but not matching [/joe]. If it finds [joe] again
in this state, then fails. If it finds [/joe], increments count by 1
and moves to state 1.

If count goes above 3, fails.

But maybe I'll use something besides a regexp, although I thought there
would be a pretty easy way to do it.

--
Posted via http://www.ruby-....

comp.lang.ruby

Regexp Question: Checking for [joe][/joe] pairs

Joe Peck

Daniel Finnie

dblack

Joe Peck

dblack

dblack

William James

Daniel Finnie

Joe Peck

Arne Brasseur

Joe Peck

x Login to ForumsZone