[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

UTF in Regexp

Wido Menhardt

2/3/2007 7:16:00 PM


I am sorrrrrry, but I am banging my head against this, and can't seem to
find the answer!

Text gets displayed in an input field in a web page with “
prepended and ” appended to the string (needs to be inside the
string otherwise it looks funny). The user edits it, and when it comes
back to the (Rails) backend, the new string with (possibly) these quotes
attached comes back, but in unicode.

So the string possibly starts with UTF “ and possibly ends with
UTF ”

I want to do a regexp removal. Here is what works (but I am embarrased):

ldquo = '123'; ldquo[0] = 226; ldquo[1] = 128; ldquo[2] = 156
rdquo = '123'; rdquo[0] = 226; rdquo[1] = 128; rdquo[2] = 157
string.gsub!(/(\A#{ldquo}|#{rdquo}\Z)/,'')

There must be a better way.


Abu Mats al-Nemsi

--
Posted via http://www.ruby-....

1 Answer

Jano Svitok

2/3/2007 8:11:00 PM

0

On 2/3/07, Wido Menhardt <a@menhardt.com> wrote:
>
> I am sorrrrrry, but I am banging my head against this, and can't seem to
> find the answer!
>
> Text gets displayed in an input field in a web page with &ldquo;
> prepended and &rdquo; appended to the string (needs to be inside the
> string otherwise it looks funny). The user edits it, and when it comes
> back to the (Rails) backend, the new string with (possibly) these quotes
> attached comes back, but in unicode.
>
> So the string possibly starts with UTF &ldquo; and possibly ends with
> UTF &rdquo;
>
> I want to do a regexp removal. Here is what works (but I am embarrased):
>
> ldquo = '123'; ldquo[0] = 226; ldquo[1] = 128; ldquo[2] = 156
> rdquo = '123'; rdquo[0] = 226; rdquo[1] = 128; rdquo[2] = 157
> string.gsub!(/(\A#{ldquo}|#{rdquo}\Z)/,'')
>
> There must be a better way.

1. it's possible to insert the chars directly, either in octal (226 =
"\342") or hexa (226= "\xe2")

string.gsub!(\A\xe2\x80\x9c|\xe2\x80\9d\Z/,")

2. | has low priority, so your regex is equal to /(\Alquo)|(rquo\z)/.
the correct one is (notice the non-capturing group (?:...)

string.gsub!(\A(?:\xe2\x80\x9c|\xe2\x80\9d)\Z/,")

3. there's iconv library that will convert things for you.