[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

No regex backreference with four backslashes

gabriel.birke

9/15/2006 10:22:00 PM

Consider the following test case:

require 'test/unit'
class RegexTest < Test::Unit::TestCase
def test_escaping
numbers = "12345"
assert_equal "12345", numbers.gsub(/(2|4)/, '\1')
assert_equal "12345", numbers.gsub(/(2|4)/, "\\1")
assert_equal "1\\ 23\\ 45", numbers.gsub(/(2|4)/, '\\ \1')
assert_equal "1\\ 23\\ 45", numbers.gsub(/(2|4)/, "\\ \\1")
assert_equal "1\\23\\45", numbers.gsub(/(2|4)/, '\\\1')
assert_equal "1\\23\\45", numbers.gsub(/(2|4)/, "\\\\1")
end
end
require 'test/unit/ui/console/testrunner'
Test::Unit::UI::Console::TestRunner.run(RegexTest)

The last two assertions fail (With the message <"1\\23\\45"> expected
but was <"1\\13\\15">.) - but why?

Is this a bug in the regex implementation or is there something wrong
with my regular expression or substitution string?

7 Answers

Paul Lutus

9/15/2006 10:44:00 PM

0

gabriel.birke@gmail.com wrote:

> Consider the following test case:
>
> require 'test/unit'
> class RegexTest < Test::Unit::TestCase
> def test_escaping
> numbers = "12345"
> assert_equal "12345", numbers.gsub(/(2|4)/, '\1')
> assert_equal "12345", numbers.gsub(/(2|4)/, "\\1")
> assert_equal "1\\ 23\\ 45", numbers.gsub(/(2|4)/, '\\ \1')
> assert_equal "1\\ 23\\ 45", numbers.gsub(/(2|4)/, "\\ \\1")
> assert_equal "1\\23\\45", numbers.gsub(/(2|4)/, '\\\1')
> assert_equal "1\\23\\45", numbers.gsub(/(2|4)/, "\\\\1")
> end
> end
> require 'test/unit/ui/console/testrunner'
> Test::Unit::UI::Console::TestRunner.run(RegexTest)
>
> The last two assertions fail (With the message <"1\\23\\45"> expected
> but was <"1\\13\\15">.) - but why?
>
> Is this a bug in the regex implementation or is there something wrong
> with my regular expression or substitution string?

To find out how your strings are being parsed, print them out. Then print
out the result of the regexes directly, rather than relying on an
assertion.

numbers.gsub(/(2|4)/,"\\\1")

"1\\\0013\\\0015"

numbers.gsub(/(2|4)/,"\\\\1")

"1\\13\\15"

The best "test suite" is your eyes.

--
Paul Lutus
http://www.ara...

gabriel.birke

9/15/2006 10:57:00 PM

0


Paul Lutus schrieb:

>
> To find out how your strings are being parsed, print them out. Then print
> out the result of the regexes directly, rather than relying on an
> assertion.
>
> numbers.gsub(/(2|4)/,"\\\1")
>
> "1\\\0013\\\0015"
>
> numbers.gsub(/(2|4)/,"\\\\1")
>
> "1\\13\\15"
>
> The best "test suite" is your eyes.

I've done that already, the test was only to show the problem: I could
not escape chars in the numbers string with a backslash.

Anyway, I found the solution, it's five backslashes instead of four.
That's a bit counter-intuitive, maybe someone can explain it.
Especially when these two are compared:

numbers.gsub(/(2|4)/,'\\ \\1')
numbers.gsub(/(2|4)/,'\\\\\1')

I expected that when I remove the space from the first expression, that
my characters would get quoted. instead, the four backslashes get
interpreted as two escaped backslashes and the 1 as a literal
character. Can somebdoy shed some light on the how and why of this
case? Especially, why the solution with the five backslashes doesn't
yield double backlashes in the result string?

MonkeeSage

9/15/2006 11:16:00 PM

0


gabriel.birke@gmail.com wrote:
> I expected that when I remove the space from the first expression, that
> my characters would get quoted. instead, the four backslashes get
> interpreted as two escaped backslashes and the 1 as a literal
> character. Can somebdoy shed some light on the how and why of this
> case? Especially, why the solution with the five backslashes doesn't
> yield double backlashes in the result string?

In the replacement string, a backreference is a backslash followed by a
number -- reference(\1) -- but a double-backslash is treated as a
literal single backslash, so \\1 == literal(\1). So then, three
backslashes and a number, \\\1 is equal to literal(\) reference(\1).
Four means literal(\\1). Finally, five means literal(\\) reference(\1),
and thus, since backslashes must be escaped to be seen as a single
backslash in a string, you end up with the resulting string
"1\\23\\45", meaning 1\23\45. Hope that makes sense.

Regards,
Jordan

Paul Lutus

9/15/2006 11:17:00 PM

0

gabriel.birke@gmail.com wrote:

>
> Paul Lutus schrieb:
>
>>
>> To find out how your strings are being parsed, print them out.

/ ...

> I've done that already, the test was only to show the problem: I could
> not escape chars in the numbers string with a backslash.
>
> Anyway, I found the solution, it's five backslashes instead of four.
> That's a bit counter-intuitive, maybe someone can explain it.
> Especially when these two are compared:
>
> numbers.gsub(/(2|4)/,'\\ \\1')
> numbers.gsub(/(2|4)/,'\\\\\1')
>
> I expected that when I remove the space from the first expression, that
> my characters would get quoted. instead, the four backslashes get
> interpreted as two escaped backslashes and the 1 as a literal
> character. Can somebdoy shed some light on the how and why of this
> case?

Sure. Parsing these strings is a trivial exercise. Each adjacent pair of
backslashes collapses into one literal backslash, and any orphan
backslashes are associated with the character to its immediate right.

> Especially, why the solution with the five backslashes doesn't
> yield double backlashes in the result string?

To sort out how Ruby is parsing your strings, *print* *them* *out.*

puts '\\\\1'

\\1 # meaning: a backslash and an escaped '1'

puts '\\ \\1'

\ \1 # meaning a backslash, a space, and an escaped '1'

puts '\\\\\1'

\\\1 # meaning two backslashes and an escaped '1'

Oh, by the way. You haven't said what you are trying to accomplish.

--
Paul Lutus
http://www.ara...

gabriel.birke

9/16/2006 9:08:00 AM

0


Paul Lutus wrote:
> puts '\\\\\1'
>
> \\\1 # meaning two backslashes and an escaped '1'
>
> Oh, by the way. You haven't said what you are trying to accomplish.

I was trying to escape some characters in a string with a backslash.

When printing out '\\\\\1' (resulting in two backslashes and and
escaped '1' like you said) I would expect the result string s
(s=numbers.gsub(/(2|4)/, '\\\\\1') to contain *two* backslashes and
then the original character. But apparently the replacement string is
interpreted as "one backslash and a backreference (escaped with two
backslashes)."

gabriel.birke

9/16/2006 9:31:00 AM

0


gabriel.birke@gmail.com wrote:

> But apparently the replacement string is
> interpreted as "one backslash and a backreference (escaped with two
> backslashes)."

After thinking a while about it I realized this is not correct.

Backslashes in a replacement string *must* be double backslashes (four
backslashes in the literal string) because otherwise they would be
interpreted as escaped characters by the regex engine. Right?

Paul Lutus

9/16/2006 4:45:00 PM

0

gabriel.birke@gmail.com wrote:

/ ...

> After thinking a while about it I realized this is not correct.
>
> Backslashes in a replacement string *must* be double backslashes (four
> backslashes in the literal string) because otherwise they would be
> interpreted as escaped characters by the regex engine. Right?

Yes, if the backslashes are followed by anything other than another
backslash (a backslash used as an escape must be followed by a target
character other than another backslash). This is why it's a good idea to
print the string you intend to use. Printing the string forces it to be
parsed, so you can see what you are getting into.

--
Paul Lutus
http://www.ara...