[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

gsub pattern substitution and ${...}

Sarah Allen

5/10/2009 11:52:00 PM

I'm trying to escape a URI that is matched by a regular expression with
gsub.

In irb, here's my string:
>> s = "<a href='http://foo.com/one=>two'/...
=> "<a href='http://foo.com/one=>two'/...

Now I want to match href="..." or href='...' and then URI.escape the
characters withing the quotes
>> require URI
==> true

First I tried this:
>> s.gsub(/href=(['"])([^']*)/, 'href=\1#{URL.escape($2)}\3')
=> "<a href='\#{URL.escape($2)}'/>"

Of course, that doesn't work since ${expr} will only eval the expression
within a double quoted string.

But when it is double quoted, like this:

>> s.gsub(/href=(['"])([^']*)/, "href=\1#{URI.escape($2)}\3")
=> "<a href=\001http://foo.com/...\003'/>

\1 doesn't evaluate to the first match anymore

I would guess there's some basic string or regex syntax that I'm missing
here. I've looked at the gsub and string documentation, and either I
missed it or I should be looking elsewhere.

Can someone give me a clue and help me move forward with my mother's day
hacking session?

Thanks in advance,
Sarah
--
Posted via http://www.ruby-....

10 Answers

7stud --

5/11/2009 12:14:00 AM

0

Sarah Allen wrote:
>
> Of course, that doesn't work since ${expr} will only eval the expression
> within a double quoted string.
>
> But when it is double quoted, like this:
>
>>> s.gsub(/href=(['"])([^']*)/, "href=\1#{URI.escape($2)}\3")
> => "<a href=\001http://foo.com/...\003'/>
>
> \1 doesn't evaluate to the first match anymore
>
> I would guess there's some basic string or regex syntax that I'm missing
> here.
>

In double quoted strings escaped characters do not have literal
meanings. For instance, in a double quoted string "\n" is not two
characters--it is one character that represents a newline. Double
quoted strings interpret all escaped characters, which means that \1
gets interpreted into something( but who knows what!).

On the other hand, with single quoted strings there are only a couple of
automatic substitutions that take place, and interpreting \1 is not one
of them. So with single quoted strings \1 means \1.

If you need to use double quoted strings, then you need to literally
have \1 in your string, which requires the use of additional backslashes
to escape the \ in "\1". So try "\\1". With ruby if one backslash is
not enough, keep adding more backslashes until whatever you are trying
accomplish works!


--
Posted via http://www.ruby-....

Sarah Allen

5/11/2009 12:18:00 AM

0

7stud -- wrote:
> If you need to use double quoted strings, then you need to literally
> have \1 in your string, which requires the use of additional backslashes
> to escape the \ in "\1". So try "\\1". With ruby if one backslash is
> not enough, keep adding more backslashes until whatever you are trying
> accomplish works!

Eureka!

>> s.gsub(/href=(['"])([^']*)/, "href=\\1#{URI.escape($2)}\\3")
=> "<a href='http://foo.com/one=%3Etwo'/...

Thanks so much for your help.

Sarah



--
Posted via http://www.ruby-....

Robert Klemme

5/11/2009 6:39:00 AM

0

On 11.05.2009 02:18, Sarah Allen wrote:
> 7stud -- wrote:
>> If you need to use double quoted strings, then you need to literally
>> have \1 in your string, which requires the use of additional backslashes
>> to escape the \ in "\1". So try "\\1". With ruby if one backslash is
>> not enough, keep adding more backslashes until whatever you are trying
>> accomplish works!
>
> Eureka!
>
>>> s.gsub(/href=(['"])([^']*)/, "href=\\1#{URI.escape($2)}\\3")
> => "<a href='http://foo.com/...'/...
>
> Thanks so much for your help.

That does not work as 7stud did not mention the most important point:
even with proper escaping this won't work as the string interpolation
takes place *before* gsub is invoked and hence URI.escape will insert
something but not the matched portion. In your tests it has probably
worked because $2 was properly set from the previous match.

In this case the block form of gsub is needed:

irb(main):007:0> s = "<a href='http://foo.com/o...'/...
=> "<a href='http://foo.com/o...'/...
irb(main):008:0> s.gsub(/href=(["'])([^'"]+)\1/) {
"href=#$1#{URI.escape($2)}#$1" }
=> "<a href='http://foo.com/...'/...

irb(main):009:0> s = "<a href=\"http://foo.com/o...\"/>"
=> "<a href=\"http://foo.com/o...\"/>"
irb(main):010:0> s.gsub(/href=(["'])([^'"]+)\1/) {
"href=#$1#{URI.escape($2)}#$1" }
=> "<a href=\"http://foo.com/...\"/>"

And if quotes differ with my regexp no replacement takes place:

irb(main):011:0> s = "<a href=\"http://foo.com/o...'/...
=> "<a href=\"http://foo.com/o...'/...
irb(main):012:0> s.gsub(/href=(["'])([^'"]+)\1/) {
"href=#$1#{URI.escape($2)}#$1" }
=> "<a href=\"http://foo.com/o...'/...

Whether this is something you want or not depends on you but AFAIK
mixing quote types is not allowed here so we probably rather not want to
do the replacement in that case.

Note that my regexp has another weakness: the quote character not used
to quote the URI should be allowed as part of the URI. I did not want
to complicate things too much but if you want to deal with this the
regular expression must be made a bit more complex.

Kind regards

robert


--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestprac...

Sebastian Hungerecker

5/11/2009 8:20:00 AM

0

Am Montag 11 Mai 2009 02:13:52 schrieb 7stud --:
> Double
> quoted strings interpret all escaped characters, which means that \1
> gets interpreted into something( but who knows what!).

The character with ASCII value 1.

Sarah Allen

5/11/2009 12:38:00 PM

0

Robert Klemme wrote:
> even with proper escaping this won't work as the string interpolation
> takes place *before* gsub is invoked and hence URI.escape will insert
> something but not the matched portion. In your tests it has probably
> worked because $2 was properly set from the previous match.

really? so the ${...} gets evaluated before the param is passed to gsub,
but the block is passed as code, so then it is evaluated after.

Using the previous attempt, with a fresh irb session, I can see the
issue:
>> $1
=> nil
>> $2
=> nil
>> $3
=> nil
>> require 'URI'
=> true
>> s.gsub(/href=(['"])([^']*)/, "href=\\1#{URI.escape($2)}\\3")
NoMethodError: private method `gsub' called for nil:NilClass
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:289:in
`escape'
from (irb):8

>> $1
=> nil
>> $2
=> nil
>> $3
=> nil
>> s = "<a href='http://foo.com/one=>two'/...
=> "<a href='http://foo.com/one=>two'/...
>> require 'URI'
=> true
>> s.gsub(/href=(["'])([^'"]+)\1/) {
?> "href=#$1#{URI.escape($2)}#$1" }
=> "<a href='http://foo.com/one=%3Etwo'/...

Nice!

> Note that my regexp has another weakness: the quote character not used
> to quote the URI should be allowed as part of the URI. I did not want
> to complicate things too much but if you want to deal with this the
> regular expression must be made a bit more complex.

Wow, interesting. That would be incorrect HTML that the browser doesn't
deal with well, so I'll not worry about it for this case, but I would be
curious how it might be handled.

Thanks so much,
Sarah
--
Posted via http://www.ruby-....

Robert Klemme

5/11/2009 12:48:00 PM

0

2009/5/11 Sarah Allen <sarah@ultrasaurus.com>:
> Robert Klemme wrote:
>> even with proper escaping this won't work as the string interpolation
>> takes place *before* gsub is invoked and hence URI.escape will insert
>> something but not the matched portion. =A0In your tests it has probably
>> worked because $2 was properly set from the previous match.
>
> really? so the ${...} gets evaluated before the param is passed to gsub,

All method parameters are evaluated before method invocation - this is
true for every method invocation in Ruby.

> but the block is passed as code, so then it is evaluated after.

In the case of gsub the block is invoked once for each match.

> Using the previous attempt, with a fresh irb session, I can see the
> issue:
>>> $1
> =3D> nil
>>> $2
> =3D> nil
>>> $3
> =3D> nil
>>> require 'URI'
> =3D> true
>>> s.gsub(/href=3D(['"])([^']*)/, "href=3D\\1#{URI.escape($2)}\\3")
> NoMethodError: private method `gsub' called for nil:NilClass
> =A0from
> /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/u=
ri/common.rb:289:in
> `escape'
> =A0from (irb):8
>
>>> $1
> =3D> nil
>>> $2
> =3D> nil
>>> $3
> =3D> nil
>>> s =3D "<a href=3D'http://foo.com/one=3D>two'/...
> =3D> "<a href=3D'http://foo.com/one=3D>two'/...
>>> require 'URI'
> =3D> true
>>> s.gsub(/href=3D(["'])([^'"]+)\1/) {
> ?> "href=3D#$1#{URI.escape($2)}#$1" }
> =3D> "<a href=3D'http://foo.com/one=3D%3Etwo'/...
>
> Nice!

:-)

>> Note that my regexp has another weakness: the quote character not used
>> to quote the URI should be allowed as part of the URI. =A0I did not want
>> to complicate things too much but if you want to deal with this the
>> regular expression must be made a bit more complex.
>
> Wow, interesting. That would be incorrect HTML that the browser doesn't
> deal with well, so I'll not worry about it for this case, but I would be
> curious how it might be handled.

Basically you need an alternative and more capturing groups along the lines=
of

'([^']+)'|"([^"]+)"

> Thanks so much,

You're welcome.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestprac...

Sarah Allen

5/11/2009 1:01:00 PM

0

Robert Klemme wrote:
> All method parameters are evaluated before method invocation - this is
> true for every method invocation in Ruby.
>
>> but the block is passed as code, so then it is evaluated after.
>
> In the case of gsub the block is invoked once for each match.

This are really important details to understand. Thanks for pointing
them out.

> Basically you need an alternative and more capturing groups along the
> lines of
>
> '([^']+)'|"([^"]+)"

Ah, of course. I knew that, but didn't put it together.

I am so appreciative of the folks on this list.

Thank you 7stud, Sebastian & Robert!

Sarah
--
Posted via http://www.ruby-....

jat

6/8/2011 8:38:00 PM

0

KNOCK !!! KNOCK !!! -Who?s there?
--
/jat
Knowledge shall make you free
El conocimiento te har? libre

Unbuenamigo

6/10/2011 2:32:00 AM

0

On Wed, 08 Jun 2011 20:27:54 -0300, RLunfa <mitialagorda@gmail.com>
wrote:

>El 08/06/2011 05:33 p.m., TORREBLANCA escribi?:
>> On 8 jun, 14:16, "Bufozzo"<educad...@gmail.com> wrote:
>>> Aca no hay nadie, vea.
>>>
>>> Le doy algunos casos:
>>>
>>> Pinko se mudo a Feisbuc porque ac? hab?a una energ?a re-lumpen.
>>> Rilke se enoj? porque le dijeron que era una Pocahontas con sobrepeso.
>>> Lunfa niega que le hizo un beb? a la Dorotea y se esconde de las muestras
>>> compulsivas para obtener su ADN.
>>> Edmundo prob? el clonazepan y se volv? buen tipo, por lo que perdi? sentido
>>> su participaci?n en estoa lares.
>>>
>>> Sr Bufozzo
>>
>>
>> con la cagada que le hicieron ustedes a edmundo no tiene verguenza si
>> les pone conversacion. yo siendo ustedes mejor me cuido de lo que
>> puede venir. las rastrerias y delincuencias como las que le hicieron
>> hay que pagarlas.... con
>> carcel.
>
>Lo que puede venir es que si el buenazo del Edmundo intenta cumplir sus
>m?nimas amenazas contra la gente del foro en general, y en particular
>contra un servidor, tomar? consciencia de que el pasaporte yanqui
>provoca dolor al entrar en el culo.
>
>Lunfita querido, proct?logo.

Yo le aconsejo que usted junte las amenazas de los dem?s y luego las
compare con las suyas, se va a llevar un sorpresa y en una de esas se
puede ahorrar de tener que explicar ante un juez, que quiere que le
diga, yo lo veo a usted como que le convendr?a explicar aqu?.
Ahora contin?e chupando que le aseguro me resulta muy divertido, pues
si bien a usted se lo percibe pat?tico, cobarde, obsecuente y traidor,
pues cuando se lo lee de lejos, como es mi caso, resulta muy
divertido.

BTW: es de muy sorete el hacerse el macho insultando e injuriando a
gente que esta ausente.

Su consejero y buen amigo.

RLunfa

6/10/2011 2:42:00 AM

0

El 09/06/2011 11:32 p.m., Un buen amigo escribi?:
> On Wed, 08 Jun 2011 20:27:54 -0300, RLunfa<mitialagorda@gmail.com>
> wrote:
>
>> El 08/06/2011 05:33 p.m., TORREBLANCA escribi?:
>>> On 8 jun, 14:16, "Bufozzo"<educad...@gmail.com> wrote:
>>>> Aca no hay nadie, vea.
>>>>
>>>> Le doy algunos casos:
>>>>
>>>> Pinko se mudo a Feisbuc porque ac? hab?a una energ?a re-lumpen.
>>>> Rilke se enoj? porque le dijeron que era una Pocahontas con sobrepeso.
>>>> Lunfa niega que le hizo un beb? a la Dorotea y se esconde de las muestras
>>>> compulsivas para obtener su ADN.
>>>> Edmundo prob? el clonazepan y se volv? buen tipo, por lo que perdi? sentido
>>>> su participaci?n en estoa lares.
>>>>
>>>> Sr Bufozzo
>>>
>>>
>>> con la cagada que le hicieron ustedes a edmundo no tiene verguenza si
>>> les pone conversacion. yo siendo ustedes mejor me cuido de lo que
>>> puede venir. las rastrerias y delincuencias como las que le hicieron
>>> hay que pagarlas.... con
>>> carcel.
>>
>> Lo que puede venir es que si el buenazo del Edmundo intenta cumplir sus
>> m?nimas amenazas contra la gente del foro en general, y en particular
>> contra un servidor, tomar? consciencia de que el pasaporte yanqui
>> provoca dolor al entrar en el culo.
>>
>> Lunfita querido, proct?logo.
>
> Yo le aconsejo que usted junte las amenazas de los dem?s y luego las
> compare con las suyas, se va a llevar un sorpresa y en una de esas se
> puede ahorrar de tener que explicar ante un juez, que quiere que le
> diga, yo lo veo a usted como que le convendr?a explicar aqu?.
> Ahora contin?e chupando que le aseguro me resulta muy divertido, pues
> si bien a usted se lo percibe pat?tico, cobarde, obsecuente y traidor,
> pues cuando se lo lee de lejos, como es mi caso, resulta muy
> divertido.
>
> BTW: es de muy sorete el hacerse el macho insultando e injuriando a
> gente que esta ausente.
>
> Su consejero y buen amigo.

Usted y Edmundo puede chuparme la pija, todos dentro de la casa de cristal.

Y el que sigue chupando es usted, en conjunto con el carcam?n y la
lesbogorda con bigotes.

Usted apost? y perdi? como un perro, volvi? a apostar y volvi? a perder
para ya no poder pagar.

Nada personal, solo negocios, pero no me gustan los traficantes de efedrina.

RLunfa