Asp Forum - The need of Unicode types in C++0x

Ioannis Vranos

10/1/2008 9:59:00 AM

Hi, I am currently learning QT, a portable C++ framework which comes
with both a commercial and GPL license, and which provides conversion
operations to its various types to/from standard C++ types.

For example its QString type provides a toWString() that returns a
std::wstring with its Unicode contents.

So, since wstring supports the largest character set, why do we need
explicit Unicode types in C++?

I think what is needed is a "unicode" locale or at the most, some
unicode locales.

I don't consider being compatible with C99 as an excuse.

29 Answers

Ioannis Vranos

10/1/2008 10:03:00 AM

Correction:

Ioannis Vranos wrote:
> Hi, I am currently learning QT, a portable C++ framework which comes
> with both a commercial and GPL license, and which provides conversion
> operations to its various types to/from standard C++ types.
>
==> For example its QString type provides a toStdWString()that returns a
> std::wstring with its Unicode contents.
>
> So, since wstring supports the largest character set, why do we need
> explicit Unicode types in C++?
>
> I think what is needed is a "unicode" locale or at the most, some
> unicode locales.
>
>
> I don't consider being compatible with C99 as an excuse.

REH

10/1/2008 4:29:00 PM

On Oct 1, 5:59 am, Ioannis Vranos <ivra...@no.spam.nospamfreemail.gr>
wrote:
> Hi, I am currently learning QT, a portable C++ framework which comes
> with both a commercial and GPL license, and which provides conversion
> operations to its various types to/from standard C++ types.
>
> For example its QString type provides a toWString() that returns a
> std::wstring with its Unicode contents.
>
> So, since wstring supports the largest character set, why do we need
> explicit Unicode types in C++?
>
> I think what is needed is a "unicode" locale or at the most, some
> unicode locales.
>
> I don't consider being compatible with C99 as an excuse.

If I understand what you are asking...

wstring in the standard defines neither the character set, nor the
encoding. Given that Unicode is currently a 21-bit standard, how can
wstring support the largest character set on a system where wchar_t is
16-bits (assuming a one-character-per-element encoding)? You could
only support the BMP (which is exactly what most systems and language
that "claim" Unicode support are really capable of).

REH

Ioannis Vranos

10/1/2008 4:57:00 PM

REH wrote:
> On Oct 1, 5:59 am, Ioannis Vranos <ivra...@no.spam.nospamfreemail.gr>
> wrote:
>> Hi, I am currently learning QT, a portable C++ framework which comes
>> with both a commercial and GPL license, and which provides conversion
>> operations to its various types to/from standard C++ types.
>>
>> For example its QString type provides a toWString() that returns a
>> std::wstring with its Unicode contents.
>>
>> So, since wstring supports the largest character set, why do we need
>> explicit Unicode types in C++?
>>
>> I think what is needed is a "unicode" locale or at the most, some
>> unicode locales.
>>
>> I don't consider being compatible with C99 as an excuse.
>
> If I understand what you are asking...
>
> wstring in the standard defines neither the character set, nor the
> encoding. Given that Unicode is currently a 21-bit standard, how can
> wstring support the largest character set on a system where wchar_t is
> 16-bits (assuming a one-character-per-element encoding)? You could
> only support the BMP (which is exactly what most systems and language
> that "claim" Unicode support are really capable of).

I do not know much about encodings, only the necessary for me stuff, but
the question does not sound reasonable for me.

If that system supports Unicode as a system-specific type, why can't
wchar_t be made wide enough as that system-specific Unicode type, in
that system?

Erik Wikström

10/1/2008 5:30:00 PM

On 2008-10-01 18:57, Ioannis Vranos wrote:
> REH wrote:
>> On Oct 1, 5:59 am, Ioannis Vranos <ivra...@no.spam.nospamfreemail.gr>
>> wrote:
>>> Hi, I am currently learning QT, a portable C++ framework which comes
>>> with both a commercial and GPL license, and which provides conversion
>>> operations to its various types to/from standard C++ types.
>>>
>>> For example its QString type provides a toWString() that returns a
>>> std::wstring with its Unicode contents.
>>>
>>> So, since wstring supports the largest character set, why do we need
>>> explicit Unicode types in C++?
>>>
>>> I think what is needed is a "unicode" locale or at the most, some
>>> unicode locales.
>>>
>>> I don't consider being compatible with C99 as an excuse.
>>
>> If I understand what you are asking...
>>
>> wstring in the standard defines neither the character set, nor the
>> encoding. Given that Unicode is currently a 21-bit standard, how can
>> wstring support the largest character set on a system where wchar_t is
>> 16-bits (assuming a one-character-per-element encoding)? You could
>> only support the BMP (which is exactly what most systems and language
>> that "claim" Unicode support are really capable of).
>
>
> I do not know much about encodings, only the necessary for me stuff, but
> the question does not sound reasonable for me.
>
> If that system supports Unicode as a system-specific type, why can't
> wchar_t be made wide enough as that system-specific Unicode type, in
> that system?

Because it has been to narrow for 5 to 10 years and the compiler vendor
does not want to take any chances with backward compatibility, and since
we will get Unicode types it is a good idea to use wchar_t for encodings
not the same size as the Unicode types.

--
Erik WikstrÃ¶m

Pete Becker

10/1/2008 5:50:00 PM

On 2008-10-01 12:57:27 -0400, Ioannis Vranos
<ivranos@no.spam.nospamfreemail.gr> said:

>
> If that system supports Unicode as a system-specific type, why can't
> wchar_t be made wide enough as that system-specific Unicode type, in
> that system?

It can be. But the language definition doesn't require it to be, and
with many implementations it's not. So if you want to traffic in
Unicode you have basically three options: ensure that your character
type can handle 21 bits, drop down to a subset of Unicode (as REH
mentioned, the BMP fits in 16 bit code points), or use a variable-width
encoding like UTF-8 or UTF-16.

Or you can wait for C++0x, which will provide char16_t and char32_t.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

James Kanze

10/2/2008 7:38:00 AM

On Oct 1, 11:59 am, Ioannis Vranos <ivra...@no.spam.nospamfreemail.gr>
wrote:
> Hi, I am currently learning QT, a portable C++ framework which
> comes with both a commercial and GPL license, and which
> provides conversion operations to its various types to/from
> standard C++ types.

> For example its QString type provides a toWString() that
> returns a std::wstring with its Unicode contents.

In what encoding format? And what if the "usual" encoding for
wstring isn't Unicode (the case on many Unix platforms).

> So, since wstring supports the largest character set, why do
> we need explicit Unicode types in C++?

Because wstring doesn't guarantee Unicode, and implementers
can't change what it does guarantee in their particular
implementation.

> I think what is needed is a "unicode" locale or at the most,
> some unicode locales.

Well, to begin with, there are only two sizes of character
types; the various Unicode encoding forms come in three sizes,
so you already have a size mismatch. And since wchar_t already
has a meaning, we can't just arbitrarily change it.

> I don't consider being compatible with C99 as an excuse.

How about being compatible with C++03?

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

10/2/2008 7:41:00 AM

On Oct 1, 6:28 pm, REH <spamj...@stny.rr.com> wrote:

[...]
> wstring in the standard defines neither the character set, nor the
> encoding. Given that Unicode is currently a 21-bit standard, how can
> wstring support the largest character set on a system where wchar_t is
> 16-bits (assuming a one-character-per-element encoding)? You could
> only support the BMP (which is exactly what most systems and language
> that "claim" Unicode support are really capable of).

No. Most systems that claim Unicode support on 16 bits use
UTF-16. Granted, it's a multi-element encoding, but if you're
doing anything serious, effectively, so is UTF-32. (In
practice, I find that UTF-8 works fine for a lot of things.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Hendrik Schober

10/2/2008 10:22:00 AM

James Kanze wrote:
> On Oct 1, 11:59 am, Ioannis Vranos <ivra...@no.spam.nospamfreemail.gr>
> wrote:
>> Hi, I am currently learning QT, a portable C++ framework which
>> comes with both a commercial and GPL license, and which
>> provides conversion operations to its various types to/from
>> standard C++ types.
>
>> For example its QString type provides a toWString() that
>> returns a std::wstring with its Unicode contents.
>
> In what encoding format? And what if the "usual" encoding for
> wstring isn't Unicode (the case on many Unix platforms).

<curious>
What are those implementations using for 'wchar_t'?
</curious>

Schobi

Ioannis Vranos

10/2/2008 10:26:00 AM

Erik Wikström wrote:
>
> Because it has been to narrow for 5 to 10 years and the compiler vendor
> does not want to take any chances with backward compatibility,

How will it break backward compatibility, if the size of whcar_t changes?

> and since
> we will get Unicode types it is a good idea to use wchar_t for encodings
> not the same size as the Unicode types.

I am talking about not needing those Unicode types since we have wchar_t
and locales.

Ioannis Vranos

10/2/2008 10:34:00 AM

Pete Becker wrote:
> On 2008-10-01 12:57:27 -0400, Ioannis Vranos
> <ivranos@no.spam.nospamfreemail.gr> said:
>
>>
>> If that system supports Unicode as a system-specific type, why can't
>> wchar_t be made wide enough as that system-specific Unicode type, in
>> that system?
>
> It can be. But the language definition doesn't require it to be, and
> with many implementations it's not

C++03 mentions:

"Type wchar_t is a distinct type whose values can represent distinct
codes for all members of the *largest* extended character set specified
among the supported *locales* (22.1.1). Type wchar_t shall have the same
size, signedness, and alignment requirements (3.9) as one of the other
integral types, called its underlying type".

comp.lang.c++

The need of Unicode types in C++0x

Ioannis Vranos

Ioannis Vranos

REH

Ioannis Vranos

Erik Wikström

Pete Becker

James Kanze

James Kanze

Hendrik Schober

Ioannis Vranos

Ioannis Vranos

x Login to ForumsZone