Asp Forum - Escaping non-ASCII chars for RTF export

mail.dph

11/1/2007 10:49:00 PM

Greetings,

I'm attempting to convert non-ASCII characters to unicode escape
sequences for export to RTF, and I haven't had much luck finding any
good information searching google. Anyone here have any good
resources for this sort of thing?

Thanks!

dan

5 Answers

Mikel Lindsaar

11/2/2007 12:15:00 AM

http://ruby-rtf.ruby...

Ruby RTF library. Creates RTF documents... might be a good start.

On 11/2/07, Dan Herrera <mail.dph@gmail.com> wrote:
> Greetings,
>
> I'm attempting to convert non-ASCII characters to unicode escape
> sequences for export to RTF, and I haven't had much luck finding any
> good information searching google. Anyone here have any good
> resources for this sort of thing?
>
> Thanks!
>
> dan
>
>
>

mail.dph

11/2/2007 2:58:00 AM

On Nov 1, 5:14 pm, raasd...@gmail.com wrote:
> http://ruby-rtf.ruby...
>
> Ruby RTF library. Creates RTF documents... might be a good start.

Hi, thanks for taking a look at my problem.

I am using the Ruby RTF library currently to generate RTF files. The
trouble I'm running into is with strings like 'g?r'. When you add
that ? character, it doesn't get converted to it's unicode counterpart
and the result is mangled when viewed.

Thanks again for your help,

dan

7stud --

11/2/2007 8:46:00 AM

Dan Herrera wrote:
> On Nov 1, 5:14 pm, raasd...@gmail.com wrote:
>> http://ruby-rtf.ruby...
>>
>> Ruby RTF library. Creates RTF documents... might be a good start.
>
> Hi, thanks for taking a look at my problem.
>
> I am using the Ruby RTF library currently to generate RTF files. The
> trouble I'm running into is with strings like 'gï¿½r'. When you add
> that ï¿½ character, it doesn't get converted to it's unicode counterpart
> and the result is mangled when viewed.
>

A unicode has to be converted into a character language(called an
'encoding') that your display device can understand before the character
can be displayed. Common character languages(or 'encodings') are ascii
and utf-8. It sounds like the string you are starting with is encoded
in a character language that your display device doesn't understand.

Therefore, you need to figure out what character language your display
device does understand. utf-8 is pretty common, so you can start off
trying to convert your strings to the utf-8 character language, and then
see if the strings will display correctly. But to convert your strings
to utf-8, you need to know the current character language that the
string is written in. If you don't know the current language, you can
start off by trying ISO-8859-15. The characters that make up the
ISO-8859-15 language are listed here:

http://en.wikipedia.org/wiki/I...

To convert from ISO-8859-15 to utf-8, you can do this:

str = "Hell\xf6 w\xf6rld" #\xf6 is 'o' with umlaut in ISO-8859-15
puts str

--output (which my display device shows me):--
Hell? w?rld #I see question marks instead of o's with umlauts

Therefore, my display device does not understand the IS0-8859-15
character language. Since I want my display device to display the o's
with umlauts, I'll try converting the string to the utf-8 character
language:

require 'iconv' #'Internationalization converter'?

converter = Iconv.new('UTF-8', 'ISO-8859-15')
new_str = converter.iconv(str)
puts new_str

--output:--
HellÃ¶ wÃ¶rld #I see o's with unlauts

--
Posted via http://www.ruby-....

mail.dph

11/2/2007 6:19:00 PM

On Nov 2, 1:46 am, bbxx789_0...@yahoo.com wrote:
> Dan Herrera wrote:
> > On Nov 1, 5:14 pm, raasd...@gmail.com wrote:
> >>http://ruby-rtf.ruby...
>
> >> Ruby RTF library. Creates RTF documents... might be a good start.
>
> > Hi, thanks for taking a look at my problem.
>
> > I am using the Ruby RTF library currently to generate RTF files. The
> > trouble I'm running into is with strings like 'g?r'. When you add
> > that ? character, it doesn't get converted to it's unicode counterpart
> > and the result is mangled when viewed.
>
> A unicode has to be converted into a character language(called an
> 'encoding') that your display device can understand before the character
> can be displayed. Common character languages(or 'encodings') are ascii
> and utf-8. It sounds like the string you are starting with is encoded
> in a character language that your display device doesn't understand.
>
> Therefore, you need to figure out what character language your display
> device does understand. utf-8 is pretty common, so you can start off
> trying to convert your strings to the utf-8 character language, and then
> see if the strings will display correctly. But to convert your strings
> to utf-8, you need to know the current character language that the
> string is written in. If you don't know the current language, you can
> start off by trying ISO-8859-15. The characters that make up the
> ISO-8859-15 language are listed here:
>
> http://en.wikipedia.org/wiki/I...
>
> To convert from ISO-8859-15 to utf-8, you can do this:
>
> str = "Hell\xf6 w\xf6rld" #\xf6 is 'o' with umlaut in ISO-8859-15
> puts str
>
> --output (which my display device shows me):--
> Hell? w?rld #I see question marks instead of o's with umlauts
>
> Therefore, my display device does not understand the IS0-8859-15
> character language. Since I want my display device to display the o's
> with umlauts, I'll try converting the string to the utf-8 character
> language:
>
> require 'iconv' #'Internationalization converter'?
>
> converter = Iconv.new('UTF-8', 'ISO-8859-15')
> new_str = converter.iconv(str)
> puts new_str
>
> --output:--
> Hell? w?rld #I see o's with unlauts

Hi,

This is great information, it's really helped me move in the right
direction. I haven't done enough testing yet, but here is what has
seemed to work.

Using an Iconv solution, where str is the string to convert.:

require 'iconv'
converter = Iconv.new('ISO-8859-15', 'UTF-8')
converted_str = converter.iconv(str)

So a little backwards from what we were thinking. Looks like swapping
UTF-8 and ISO-8859-15 did the trick since it appears that the string
was in UTF-8 to begin with.

Thanks!

dan

7stud --

11/2/2007 9:21:00 PM

Dan Herrera wrote:
> This is great information, it's really helped me move in the right
> direction.
>
> Thanks!
>

There is one missing piece to the puzzle. This is what happens behind
the scenes when you convert from a string written in UTF-8 format to a
string written in ISO-8859-15 format:

UTF-8 encoded character
|
|
V
Unicode integer
|
|
V
ISO-8859-15 encoded character

If for some reason, you ever need to get the unicode integer, you can do
this:

str = "\xc3\xb6" #'o' with umlaut encoded in utf-8
arr = str.unpack('U') #'U' gets the unicode from a char encoded in
*utf-8* only

p arr #[246] --> unicode in decimal format

Since unicode integers are usually written in hex format, you can do the
following to get the unicode in hex format:

puts "%04x" % arr[0] #00f6
--
Posted via http://www.ruby-....

comp.lang.ruby

Escaping non-ASCII chars for RTF export

mail.dph

Mikel Lindsaar

mail.dph

7stud --

mail.dph

7stud --

x Login to ForumsZone