Jesse P.
1/5/2008 5:57:00 PM
Hi Matz,
Thanks for your help. So I guess my problem is this:
1. I get an XML that is declared to be valid UTF-8, but
2. when I process some of the values, as you pointed out, some is not
valid UTF-8, and
3. causes a lot of problems when parsed by REXML.
For a string of characters (e.g. some xml file), is there anyway I can
detect just the non UTF-8 characters and convert them to UTF-8?
This way I can make sure what is processed by REXML is valid UTF-8
without unnecessarily processing characters in the string that are
already valid UTF-8.
Best regards,
Jesse
On Jan 5, 10:41 pm, Yukihiro Matsumoto <m...@ruby-lang.org> wrote:
> Hi,
>
> In message "Re: REXML::Document could not parse UTF-8 "<name>\302</name>""
> on Sat, 5 Jan 2008 02:40:00 +0900, "Jesse P." <j.prab...@gmail.com> writes:
>
> |Im working with some UTF-8 data and basically if I run this:
> |
> |require 'rexml/document'
> |data = "<name>\302</name>"
> |doc = REXML::Document.new(data)
>
> "<name>\302</name>" is not a valid UTF-8 byte sequence. The rest is
> up to you, after recognizing working on non UTF-8 data.
>
> matz.