Asp Forum - Regexp riddle; escaping escapes

Phlip

8/17/2007 10:03:00 AM

Rubies:

Someone didn't escape their & in their HTML correctly. Let's fix it.

This regexp correctly does not escape &dude, because we only want to escape
raw & markers:

p "yo &dude".gsub(/&([^a-z])/i, '&\1')

That passed "yo &dude" thru unchanged. (I am aware "dude" has no ; on the
end; we are leaving that optional, for whatever reason...)

Now escape & followed by a non-alphabetic character:

p "yo & dude".gsub(/&([^a-z])/i, '&\1')

That correctly provides: "yo & dude"

Now how to escape "yo && dude"? Note that the ([^a-z]) consumes the second
&, leading to this incorrect output:

"yo && dude"

The only workaround I can think of is to run the Regexp twice:

x = "yo && dude"
2.times{ x.gsub!(/&([^a-z])/i, '&\1') }
p x

Can someone help my feeb Regexp skills and get a "yo && dude" in one
line?

--
Phlip
http://www.oreilly.com/catalog/9780...
^ assert_xpath
http://tinyurl.... <-- assert_raise_message

3 Answers

Tim Pease

8/17/2007 2:54:00 PM

On 8/17/07, Phlip <phlip2005@gmail.com> wrote:
> Rubies:
>
> Someone didn't escape their & in their HTML correctly. Let's fix it.
>
> This regexp correctly does not escape &dude, because we only want to escape
> raw & markers:
>
> p "yo &dude".gsub(/&([^a-z])/i, '&\1')
>
> That passed "yo &dude" thru unchanged. (I am aware "dude" has no ; on the
> end; we are leaving that optional, for whatever reason...)
>
> Now escape & followed by a non-alphabetic character:
>
> p "yo & dude".gsub(/&([^a-z])/i, '&\1')
>
> That correctly provides: "yo & dude"
>
> Now how to escape "yo && dude"? Note that the ([^a-z]) consumes the second
> &, leading to this incorrect output:
>
> "yo && dude"
>
> The only workaround I can think of is to run the Regexp twice:
>
> x = "yo && dude"
> 2.times{ x.gsub!(/&([^a-z])/i, '&\1') }
> p x
>
> Can someone help my feeb Regexp skills and get a "yo && dude" in one
> line?
>

str = "yo && dude"
str.gsub!( %r/&(?=[^a-z])/i, '&')
p str
=> "yo && dude"

The regular expression trick here is the (?=re) That's called the
"zero-width positive lookahead". It matches, but it does not consume
the string; so the gsub! will only replace the characters that are NOT
inside (?=re).

Blessings,
TwP

Phlip

8/17/2007 3:01:00 PM

Tim Pease wrote:

> str.gsub!( %r/&(?=[^a-z])/i, '&')

Thanks!

> "zero-width positive lookahead"

Man, that was right there, but I was blocking on it. (-;

--
Phlip
http://www.oreilly.com/catalog/9780...
^ assert_xpath
http://tinyurl.... <-- assert_latest Model

Tim Pease

8/17/2007 3:39:00 PM

On 8/17/07, Phlip <phlip2005@gmail.com> wrote:
> Tim Pease wrote:
>
> > str.gsub!( %r/&(?=[^a-z])/i, '&')
>
> Thanks!
>
> > "zero-width positive lookahead"
>
> Man, that was right there, but I was blocking on it. (-;
>

I had to pull my pickaxe off the shelf and look it up, too. Page 327
in the second edition if you're interested in reading about it. It's
in the first edition, too, that is available online.

Blessings,
TwP

comp.lang.ruby

Regexp riddle; escaping escapes

Phlip

Tim Pease

Phlip

Tim Pease

x Login to ForumsZone