[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

how to remove strange characters

Li Chen

10/7/2008 4:29:00 PM

Hi all,

I grap some info from a webpage. Sometimes I get some stranges
characters as follows (by p):
To depart in a hurry; abscond: \342\200\234Your horse
has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.

or (by print):
To depart in a hurry; abscond: ââ?¬Å?Your horse has absquatulated!ââ?¬Â
(Robert M. Bird) To die.

Any idea to to get rid of them?


Thanks,

Li
--
Posted via http://www.ruby-....

6 Answers

Li Chen

10/8/2008 2:24:00 PM

0

Stephen Celis wrote:

> Those are multi-byte characters (curly quotes, in this case). You
> probably don't want to get rid of them, but you can use the iconv
> library to transliterate them back to their ASCII almost-equivalents:
>
>>> string = "To depart in a hurry; abscond: \342\200\234Your horse has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."
> => "To depart in a hurry; abscond: \342\200\234Your horse
> has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."
>>> require 'iconv'
> => true
>>> puts Iconv.iconv('ascii//translit', 'utf-8', string).to_s
> To depart in a hurry; abscond: "Your horse has
> absquatulated!" (Robert M. Bird) To die.
> => nil
>
> Stephen

Thank you,

Li
--
Posted via http://www.ruby-....

Li Chen

10/8/2008 4:14:00 PM

0

Hi Stephen and others,

Iconv only works for some characters. It doesn't work for the following
scripts.

Any idea?

Thanks,

Li


C:\Users\Alex>irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
tic'd thee to thy bane?\224
\342\200\246"
=> "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
bane?\224\342\200\246"
irb(main):003:0> puts
Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
Iconv::IllegalSequence: "\223Hath some fond "...
from (irb):3:in `iconv'
from (irb):3
irb(main):004:0>





--
Posted via http://www.ruby-....

Pablo Q.

10/8/2008 4:35:00 PM

0

[Note: parts of this message were removed to make it a legal post.]

what do you think doing something like this?

class String
def remove_nonascii(replacement)
n=self.split("")
self.slice!(0..self.size)
n.each{|b|
if (b[0].to_i< 32 || b[0].to_i>124) then
self.concat(replacement)
elsif
[34,35,37,42,43,44,45,47,60,61,62,63,91,92,93,94,96,123].include?(b[0].to_i)
self.concat(replacement)
else
self.concat(b)
end
}
self.to_s
end
end

"Fatal injury or ruin:\223Hath some fond lover tic'd thee to
thybane?\224\342\200\246".remove_nonascii('+')

=> "Fatal injury or ruin:+Hath some fond lover tic'd thee to thybane+++++"

how you can see, it made the replacement with char '+'.


2008/10/8 Li Chen <chen_li3@yahoo.com>

> Hi Stephen and others,
>
> Iconv only works for some characters. It doesn't work for the following
> scripts.
>
> Any idea?
>
> Thanks,
>
> Li
>
>
> C:\Users\Alex>irb
> irb(main):001:0> require 'iconv'
> => true
> irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
> tic'd thee to thy bane?\224
> \342\200\246"
> => "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
> bane?\224\342\200\246"
> irb(main):003:0> puts
> Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
> Iconv::IllegalSequence: "\223Hath some fond "...
> from (irb):3:in `iconv'
> from (irb):3
> irb(main):004:0>
>
>
>
>
>
> --
> Posted via http://www.ruby-....
>
>


--
Pablo Q.

Nit Khair

10/9/2008 3:45:00 AM

0

Li Chen wrote:
> Hi all,
>
> I grap some info from a webpage. Sometimes I get some stranges
> characters as follows (by p):
> To depart in a hurry; abscond: \342\200\234Your horse
> has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.

Here's a quick hack I used recently. It was messing my display on
ncurses, and I did not need the characters.

dataitem.gsub!(/[^[:space:][:print:]]/,'')

I got this while googling, iirc, its used somewhere in ROR.
--
Posted via http://www.ruby-....

Li Chen

10/9/2008 7:51:00 PM

0

Nit Khair wrote:
> Here's a quick hack I used recently. It was messing my display on
> ncurses, and I did not need the characters.
>
> dataitem.gsub!(/[^[:space:][:print:]]/,'')
>
> I got this while googling, iirc, its used somewhere in ROR.

It works on scenario where iconv doesn't work. Good job!!!

Li

--
Posted via http://www.ruby-....

Bilyk, Alex

10/10/2008 12:55:00 AM

0

There is no one-click installer for 1.9 on Windows as far as I can tell. Do=
wnloading and unpacking the ziped binaries didn't get me very far as both r=
uby and irb complain that something is missing. Does binary distribution re=
quire me to install anything else? Like libraries? If this is the case what=
additional stuff do I need to make 1.9 to work and where can I get it?

Thanks,
Alex