Asp Forum - Re: REXML::Document could not parse UTF-8 "<name>\302</name>"

Yukihiro Matsumoto

1/5/2008 2:41:00 PM

Hi,

In message "Re: REXML::Document could not parse UTF-8 "<name>\302</name>""
on Sat, 5 Jan 2008 02:40:00 +0900, "Jesse P." <j.prabawa@gmail.com> writes:

|Im working with some UTF-8 data and basically if I run this:
|
|require 'rexml/document'
|data = "<name>\302</name>"
|doc = REXML::Document.new(data)

"<name>\302</name>" is not a valid UTF-8 byte sequence. The rest is
up to you, after recognizing working on non UTF-8 data.

matz.

1 Answer

Jesse P.

1/5/2008 5:57:00 PM

Hi Matz,

Thanks for your help. So I guess my problem is this:
1. I get an XML that is declared to be valid UTF-8, but
2. when I process some of the values, as you pointed out, some is not
valid UTF-8, and
3. causes a lot of problems when parsed by REXML.

For a string of characters (e.g. some xml file), is there anyway I can
detect just the non UTF-8 characters and convert them to UTF-8?

This way I can make sure what is processed by REXML is valid UTF-8
without unnecessarily processing characters in the string that are
already valid UTF-8.

Best regards,

Jesse

On Jan 5, 10:41 pm, Yukihiro Matsumoto <m...@ruby-lang.org> wrote:
> Hi,
>
> In message "Re: REXML::Document could not parse UTF-8 "<name>\302</name>""
> on Sat, 5 Jan 2008 02:40:00 +0900, "Jesse P." <j.prab...@gmail.com> writes:
>
> |Im working with some UTF-8 data and basically if I run this:
> |
> |require 'rexml/document'
> |data = "<name>\302</name>"
> |doc = REXML::Document.new(data)
>
> "<name>\302</name>" is not a valid UTF-8 byte sequence. The rest is
> up to you, after recognizing working on non UTF-8 data.
>
> matz.

comp.lang.ruby

Re: REXML::Document could not parse UTF-8 "\302"

Yukihiro Matsumoto

Jesse P.

comp.lang.ruby

Re: REXML::Document could not parse UTF-8 "\302"

Yukihiro Matsumoto

Jesse P.

x Login to ForumsZone