Re: UTF-8, UTF-16 and UTF-32

"Dallas Clarke" <DClarke@xxxxxxxxxxxxxx> · Sat, 23 Aug 2008 11:37:13 +1000

There is a solution that will please everyone and your stance for not doing 
it is at it break the ABI, but haven't we learnt anything for the 2/4 byte 
int type debacle of several decades ago - why would you want to go through 
that all over again.

You argue why only GCC, although MSVC++ is using 2-byte wchar_t, Borland C++ 
Builder has a policy of conforming to MSVC++ and most likely already uses 
2-byte wchar_t, Sun Studio will most like bend to the market reality and 
that will leave GCC.

My preferred Solution: -

Standardise: - sizeof(char) = 1; sizeof(wchar_t) = 2; and sizeof(long 
wchar_t) = 4.

Implement all the string functions: - strcmp(); mbscmp(); wcscmp(); and 
lcscmp().

In ASCII C++ source files: -

"String" returns type char

L"String" returns type wchar_t

LL"String" returns type long wchar_t

In UTF-8 C++ source files: -

"String" return type unsigned char

L"String" returns type wchar_t

LL"String" returns type long wchar_t

In UTF-16 C++ source files: -

A"String" returns type unsigned char

"String" returns type wchar_t

LL"String" returns type long wchar_t

In UTF-32 C++ source files: -

A"String" returns type unsigned char

L"String" returns type wchar_t

"String" returns type long wchar_t

In this solution there is something for everyone, the Chinese can write 
their source code in visible Mandarin in UTF-16 or UTF-32, not in 
hexadecimal ASCII. The Europeans can save a few bytes by writing in UTF-8. 
We can all process files in any of the Unicode text formats from any OS. No 
one need to implement dodgy string conversion routines that must allocate 
memory and not release it. We can use constant string in function 
parameters - such as strcmp(string,"answer"), rather then allocating and 
initialising vectors every time.

Why not support all three Unicode formats? If it breaks the ABI, then the 
ABI needs to be broken. We are all responsible for our own actions and 
lettings someone else make bad decisions for us, we are just as liable as if 
we made the decision ourselves.

Dallas.

http://www.ekkySoftware.com/