Re: UTF-8, UTF-16 and UTF-32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Corey and Scott

At the sake of sound repetitive, the problems are:-
1) using wchar_t for 4-byte string stuffs up function overloading - for example problems with shared libraries written with different 2-byte string type (i.e., short, unsigned short, struct UTF16{ typedef uint16_t Type; Type mEncodingUnit;};, etc) 2) casting error from declaring strings as unsigned short string[] = {'H','e','l','l','o',' ','W','o','r','l','d',0} or unsigned short *string = (unsinged short*)"H\0e\0l\0l\0o\0 \0W\0o\0r\0l\0d\0\0"; 3) pointer arithmetic bugs and other portability issues consuming time, money and resources. 4) no standard library for strings functions, creating different behaviours from different implementations. And the standard C-Library people will not implement the string routines until there is a standard type for 16-bit strings offered by the compiler.

The full set of MS Common Controls no longer support the -D _MBCS, this means I must compile in with -D UNICODE and -D _UNICODE, this makes all the standard WINAPI to use 16-bit Unicode strings as well. Rather than constantly convert between UTF-8 and 16-bit Unicode I am moving totally to 16-bit Unicode. Why is MS doing this - probably because they know your not supporting 16-bit Unicode and that will force people like me to drop plans to port to Linux/Solaris because it is just too hard.

Once again, there are no legacy issues because no one is currently using 16-bit Unicode in GCC, it does not exist. Adding such support will not break anything. I am not arguing to stop support for 32-bit Unicode. Secondly object code does not use the label "wchar_t", meaning the change would force people to do a global search and replace "wchar_t" to "long wchar_t" before their next compile. Quite a simple change compared to what I must do to support 16-bit strings in GCC.

It would be nice to substitute "long wchar_t" for "wchar_t" as it would not only be consistent with MS VC++, but also definitions of double and long double, long and long long, and integers as 123456789012LL. Using S"String" or U"String" would be too confusing with signed and unsigned.

The issues of confusion between Unicode Text Format (UTF) 8, 16 and 32, are not only mine, but as pointed out earier, they are constantly changing. The 16-bit string is a format I am forced to deal with and there is no support from GCC at all. I can't tell you if MS Unicode is the older style fixed 16-bit or it is the newer multibyte type similar to the UTF-8 definition.

And in case you don't already know, MS VC++ compiles source code written in 16-bit Unicode, allowing function name, variables and strings to be written in 16-bit Unicode. This means that more and more files/data is going to be 16-bit Unicode. Developers like myself are going to have to deal with the format, whether we like it or not.

And to answer Scott's questions why must we follow the thousand pound gorilla that is Microsoft? For the same reason that rain falls down, because that is just how the world is. I have spend the last three week migrating all our products to 16-bit Unicode, I probably still have another two weeks to go, then 2-4 weeks of testing after that. Do I like it? No, I just have to keep our products competitive in a market environment. And also how does Microsoft deal with issues of "mountains of legacy code"? Like with this move to Unicode, they just stop fully supporting UTF-8 and if you don't move you become a dinosaur. (They call it "deprecation", I think they meant depreciation because they lower it, not strangle it.)

So I have to ask - what are your arguments for not providing support for all three, 8-bit, 16-bit and 32-bit Unicode strings?

Regards,
Dallas.
http://www.ekkySoftware.com


P.S. I suggest that the strings default to the same type as the underlining file format, other wise it can be overridden by expressively stating:- A"String", L"String", LL"Sting".



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux