[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.c++

Unicode to characters

KK

10/6/2008 11:37:00 PM

Hello all,
There could be flavors of this question discussed in the past, but I
could not really make a head/tail out of it.

I have bunch of unicode values stored in a string array and I want to
see the corresponding characters displayed in an excel file. How could
I go about doing that ?

vector<string> unicodevalues; // has values 0041, 0042, ... 0410 etc.
(hexa decimal values)
for 0041 (assumes hex) I should see alphabet 'A' , a 'B' for 0042 ...
special character corresponding to 0x410.

I could live with a comma separated .csv file instead of a .xls to
view it in excel.

Please advice.





3 Answers

Daniel T.

10/7/2008 10:03:00 AM

0

KK <pedagani@gmail.com> wrote:

> There could be flavors of this question discussed in the past, but I
> could not really make a head/tail out of it.
>
> I have bunch of unicode values stored in a string array and I want to
> see the corresponding characters displayed in an excel file. How could
> I go about doing that ?
>
> vector<string> unicodevalues; // has values 0041, 0042, ... 0410 etc.
> (hexa decimal values)
> for 0041 (assumes hex) I should see alphabet 'A' , a 'B' for 0042 ...
> special character corresponding to 0x410.
>
> I could live with a comma separated .csv file instead of a .xls to
> view it in excel.
>
> Please advice.

It's a pretty complex topic. There are about a half-dozen different ways
to represent unicode characters in a file (e.g.: UTF-8, UTF-16 both LE
and BE versions and others... see the wikipedia article on Unicode for
others.)

For what you want though, I think the best bet would be UTF-16LE with a
Byte Order Mark at the beginning.

Based on the description above, I'm assuming you have a vector of
strings where each string is the U+ value of a particular character. If
so, then you simply have to make a function that converts a string into
its UTF-16 byte equivalent. How you do that depends very much on what
kind of environment you are working in. Is it natively big endian or
small endian, for example?

Fundamentally, you have to convert your strings into an array of chars
that you can then send to a file stream.

void convert( const string& s, char* c )
{
c[0] = // the last byte
c[1] = // the first byte
}

char c[2];
for ( vector<string>::iterator it = myVec.begin();
it != myVec.end();
++it )
{
convert( *it, c );
myFile << c[0];
myFile << c[1];
}

I don't have time right now to go more into it, but if you respond, I
will add to the above.

pjb

10/7/2008 10:08:00 AM

0

KK <pedagani@gmail.com> writes:

> Hello all,
> There could be flavors of this question discussed in the past, but I
> could not really make a head/tail out of it.
>
> I have bunch of unicode values stored in a string array and I want to
> see the corresponding characters displayed in an excel file. How could
> I go about doing that ?
>
> vector<string> unicodevalues; // has values 0041, 0042, ... 0410 etc.

If you are refering to std::string, then it's a
std::basic_string<char> so you only get bytes.

If, as it is most probable, your CHAR_BITS==8, then you can only store
the codes of ISO-8859-1 characters in these strings.


> (hexa decimal values)
> for 0041 (assumes hex) I should see alphabet 'A' , a 'B' for 0042 ...
> special character corresponding to 0x410.

0x410 is not the unicode for a special character. It's the unicode for
the CYRILLIC_CAPITAL_LETTER_A.


> I could live with a comma separated .csv file instead of a .xls to
> view it in excel.

I would advise you to get a better understanding of characters, codes,
the STL, I/O, files. Start reading:

http://en.wikipedia.org/wi...
http://en.wikipedia.org/...
http://www.cplusplus.com/reference/stri...
http://www.cplusplus.com/reference...

etc...

--
__Pascal Bourguignon__

James Kanze

10/7/2008 11:08:00 AM

0

On Oct 7, 12:08 pm, p...@informatimago.com (Pascal J. Bourguignon)
wrote:
> KK <pedag...@gmail.com> writes:

> > There could be flavors of this question discussed in the
> > past, but I could not really make a head/tail out of it.

> > I have bunch of unicode values stored in a string array and
> > I want to see the corresponding characters displayed in an
> > excel file. How could I go about doing that ?

> > vector<string> unicodevalues; // has values 0041, 0042, ... 0410 etc.

> If you are refering to std::string, then it's a
> std::basic_string<char> so you only get bytes.

> If, as it is most probable, your CHAR_BITS==8, then you can
> only store the codes of ISO-8859-1 characters in these
> strings.

Nonsense. I regularly use char for Unicode (UTF-8) and ISO
8859-15; in other places, other ISO 8859 codes, or JIS are also
used. Not to mention various Windows (and earlier MS-DOS) code
pages, or EBCDIC (which is still used, in 8 bit bytes, on IBM
mainframes).

Still, I don't know what he really has or wants. Some posters
seem to think that he has a textual representation of the
unicode code values, e.g. strings like "0041". Which seems
wierd to me, but who knows.

> > (hexa decimal values)
> > for 0041 (assumes hex) I should see alphabet 'A' , a 'B' for
> > 0042 ... special character corresponding to 0x410.

> 0x410 is not the unicode for a special character. It's the
> unicode for the CYRILLIC_CAPITAL_LETTER_A.

Well, that's a special character to me:-). I certainly don't
use it very often.

> > I could live with a comma separated .csv file instead of a
> > .xls to view it in excel.

> I would advise you to get a better understanding of characters, codes,
> the STL, I/O, files. Start reading:
>
> http://en.wikipedia.org/wi...
> http://en.wikipedia.org/...
> http://www.cplusplus.com/reference/stri...
> http://www.cplusplus.com/reference...

> etc...

The best reference I know about these issues is "Fonts and
Encoding", by Yannis Haralambous. (I've not seen the English
translation---I hope it's better than the translations of
English into French we usually get.) And of course, he'll also
need to find out about Excel. But I'd be very surprised if it
didn't have an option for reading UTF-8, at least in CSV.
(Alternatively, he could use UTF-16LE; I think that's the native
code set under Windows.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34