Re: UTF-8, UTF-16 and UTF-32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Scott,

I guess that ASCII would be char, UTF-8 would be unsigned char, UTF-16 would be wchar_t and UTF-32 would be long wchar_t. But it is more appropriate just to have the three sizes of strings, i.e. 8-bits, 16-bits and 32 bits, and the ability to have const 16-bit strings.

wchar_t* strchr(wchar_t *string, wchar_t chr){
   while(*string != '\0' && *string != chr) ++string;
   if(*string == chr) return string;
   return NULL;
}

const wchar_t* strchr(const wchar_t *string, wchar_t chr){
   while(*string != '\0' && *string != chr) ++string;
   if(*string == chr) return string;
   return NULL;
}

Cheers,
Dallas.
http://www.ekkySoftware.com/

----- Original Message ----- From: "me22" <me22.ca@xxxxxxxxx>
To: "Dallas Clarke" <DClarke@xxxxxxxxxxxxxx>
Cc: "Eljay Love-Jensen" <eljay@xxxxxxxxx>; "GCC-help" <gcc-help@xxxxxxxxxxx>
Sent: Saturday, August 23, 2008 12:12 PM
Subject: Re: UTF-8, UTF-16 and UTF-32


On Fri, Aug 22, 2008 at 21:37, Dallas Clarke <DClarke@xxxxxxxxxxxxxx> wrote:

Standardise: - sizeof(char) = 1; sizeof(wchar_t) = 2; and sizeof(long
wchar_t) = 4.


Do you mean "standardize char as UTF-8, wchar_t as UTF-16, and long
wchar_t as UTF-32"?  Because that's not what you said, even if (on
POSIX, but not necessarily C or C++) the sizes would be appropriate.

Implement all the string functions: - strcmp(); mbscmp(); wcscmp(); and
lcscmp().


How exactly do you plan on implementing strchr for UTF-16?
Specifically, what would its signature be?

~ Scott



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux