Hello Corey and Scott
At the sake of sound repetitive, the problems are:-
1) using wchar_t for 4-byte string stuffs up function overloading - for
example problems with shared libraries written with different 2-byte string
type (i.e., short, unsigned short, struct UTF16{ typedef uint16_t Type; Type
mEncodingUnit;};, etc)
2) casting error from declaring strings as unsigned short string[] =
{'H','e','l','l','o',' ','W','o','r','l','d',0} or unsigned short *string =
(unsinged short*)"H\0e\0l\0l\0o\0 \0W\0o\0r\0l\0d\0\0";
3) pointer arithmetic bugs and other portability issues consuming time,
money and resources.
4) no standard library for strings functions, creating different behaviours
from different implementations. And the standard C-Library people will not
implement the string routines until there is a standard type for 16-bit
strings offered by the compiler.
The full set of MS Common Controls no longer support the -D _MBCS, this
means I must compile in with -D UNICODE and -D _UNICODE, this makes all the
standard WINAPI to use 16-bit Unicode strings as well. Rather than
constantly convert between UTF-8 and 16-bit Unicode I am moving totally to
16-bit Unicode. Why is MS doing this - probably because they know your not
supporting 16-bit Unicode and that will force people like me to drop plans
to port to Linux/Solaris because it is just too hard.
Once again, there are no legacy issues because no one is currently using
16-bit Unicode in GCC, it does not exist. Adding such support will not break
anything. I am not arguing to stop support for 32-bit Unicode. Secondly
object code does not use the label "wchar_t", meaning the change would force
people to do a global search and replace "wchar_t" to "long wchar_t" before
their next compile. Quite a simple change compared to what I must do to
support 16-bit strings in GCC.
It would be nice to substitute "long wchar_t" for "wchar_t" as it would not
only be consistent with MS VC++, but also definitions of double and long
double, long and long long, and integers as 123456789012LL. Using S"String"
or U"String" would be too confusing with signed and unsigned.
The issues of confusion between Unicode Text Format (UTF) 8, 16 and 32, are
not only mine, but as pointed out earier, they are constantly changing. The
16-bit string is a format I am forced to deal with and there is no support
from GCC at all. I can't tell you if MS Unicode is the older style fixed
16-bit or it is the newer multibyte type similar to the UTF-8 definition.
And in case you don't already know, MS VC++ compiles source code written in
16-bit Unicode, allowing function name, variables and strings to be written
in 16-bit Unicode. This means that more and more files/data is going to be
16-bit Unicode. Developers like myself are going to have to deal with the
format, whether we like it or not.
And to answer Scott's questions why must we follow the thousand pound
gorilla that is Microsoft? For the same reason that rain falls down, because
that is just how the world is. I have spend the last three week migrating
all our products to 16-bit Unicode, I probably still have another two weeks
to go, then 2-4 weeks of testing after that. Do I like it? No, I just have
to keep our products competitive in a market environment. And also how does
Microsoft deal with issues of "mountains of legacy code"? Like with this
move to Unicode, they just stop fully supporting UTF-8 and if you don't move
you become a dinosaur. (They call it "deprecation", I think they meant
depreciation because they lower it, not strangle it.)
So I have to ask - what are your arguments for not providing support for all
three, 8-bit, 16-bit and 32-bit Unicode strings?
Regards,
Dallas.
http://www.ekkySoftware.com
P.S. I suggest that the strings default to the same type as the underlining
file format, other wise it can be overridden by expressively stating:-
A"String", L"String", LL"Sting".