Asp Forum - Re: invalid byte sequence in US-ASCII (ArgumentError

Yukihiro Matsumoto

2/17/2009 1:17:00 AM

Hi,

In message "Re: invalid byte sequence in US-ASCII (ArgumentError)"
on Mon, 16 Feb 2009 09:15:54 +0900, Luther <lutheroto@gmail.com> writes:

|I'm having some trouble migrating from 1.8 to 1.9.1. I have this line of
|code:
|
|text.gsub! "\C-m", ''
|
|...which generates this error:
|
|/home/luther/bin/dos2gnu:16:in `gsub!': invalid byte sequence in
|US-ASCII (ArgumentError)
|
|The purpose is to strip out any ^M characters from the string.

I feel some smell of a bug. Could you show me the whole code and
reproducing input please?

matz.

2 Answers

ThoML

2/17/2009 7:26:00 AM

> > I feel some smell of a bug. =A0Could you show me the whole code and
> > reproducing input please?
>
> Sure, here you go...

When I recently stumbled over not so different problems (one of which
is described here [1]) it was because the external encoding (see
Encoding.default_external) defaulted to US-ASCII on cygwin because
ruby191RC0 ignored the windows locale and the value of the LANG
variable -- the part with the windows locale was fixed in the
meantime. AFAIK if ruby 191 cannot determine the environment's locale,
it defaults to US-ASCII which causes the described problem if a
character is > 127.

[1] http://groups.google.com/group/ruby-talk-google/browse_frm/thr...
72d8fb808ba/

Luther

2/17/2009 1:55:00 PM

Tom Link wrote:
> When I recently stumbled over not so different problems (one of which
> is described here [1]) it was because the external encoding (see
> Encoding.default_external) defaulted to US-ASCII on cygwin because
> ruby191RC0 ignored the windows locale and the value of the LANG
> variable -- the part with the windows locale was fixed in the
> meantime. AFAIK if ruby 191 cannot determine the environment's locale,
> it defaults to US-ASCII which causes the described problem if a
> character is > 127.

Actually, I always set my LANG to C. Since my original post, I found
that I had forgotten to set my LC_CTYPE to en_US.UTF-8, which is
Ubuntu's default. After fixing that, I still got the same error, but
with "UTF-8" instead of "US-ASCII".

I believe the metadata in that text file must be binary code that was
put there by some word processor, because I remember seeing "Helvetica"
somewhere in there.

Luther
--
Posted via http://www.ruby-....

comp.lang.ruby

Re: invalid byte sequence in US-ASCII (ArgumentError

Yukihiro Matsumoto

ThoML

Luther

x Login to ForumsZone