There is a solution that will please everyone and your stance for not doing
it is at it break the ABI, but haven't we learnt anything for the 2/4 byte
int type debacle of several decades ago - why would you want to go through
that all over again.
You argue why only GCC, although MSVC++ is using 2-byte wchar_t, Borland C++
Builder has a policy of conforming to MSVC++ and most likely already uses
2-byte wchar_t, Sun Studio will most like bend to the market reality and
that will leave GCC.
My preferred Solution: -
Standardise: - sizeof(char) = 1; sizeof(wchar_t) = 2; and sizeof(long
wchar_t) = 4.
Implement all the string functions: - strcmp(); mbscmp(); wcscmp(); and
lcscmp().
In ASCII C++ source files: -
"String" returns type char
L"String" returns type wchar_t
LL"String" returns type long wchar_t
In UTF-8 C++ source files: -
"String" return type unsigned char
L"String" returns type wchar_t
LL"String" returns type long wchar_t
In UTF-16 C++ source files: -
A"String" returns type unsigned char
"String" returns type wchar_t
LL"String" returns type long wchar_t
In UTF-32 C++ source files: -
A"String" returns type unsigned char
L"String" returns type wchar_t
"String" returns type long wchar_t
In this solution there is something for everyone, the Chinese can write
their source code in visible Mandarin in UTF-16 or UTF-32, not in
hexadecimal ASCII. The Europeans can save a few bytes by writing in UTF-8.
We can all process files in any of the Unicode text formats from any OS. No
one need to implement dodgy string conversion routines that must allocate
memory and not release it. We can use constant string in function
parameters - such as strcmp(string,"answer"), rather then allocating and
initialising vectors every time.
Why not support all three Unicode formats? If it breaks the ABI, then the
ABI needs to be broken. We are all responsible for our own actions and
lettings someone else make bad decisions for us, we are just as liable as if
we made the decision ourselves.
Dallas.
http://www.ekkySoftware.com/