On Wednesday 20 July 2005 17:52, Sergei Haller wrote: > On Wed, 20 Jul 2005, Ludwig Nussel (LN) wrote: > > Klaus Schmidinger wrote: > > > [...] > > > To me, a character is an entity that's always the same size (preferably > > > one byte). UTF-8 breaks with this, so if you have a string that has, > > > e.g. a strlen() of 10, you can't be sure that this will be really 10 > > > printing > > > characters because there might be some "escaped" characters. > > I think the confusion comes from the assumption that a character is > exactly one byte long. > > strlen counts bytes not characters. > > in utf-8 a character can be up to 4 (or was it 8) bytes long. Correct. The "ascii 7 bit" is one byte, everything else needs escape characters, e.g. German umlauts are 2 bytes each. > IIRC, there are new functions to count characters (wstrlen, wstrcmp, > etc.) Wrong. This is for wide characters, where every character uses 2 or 4 bytes. In fact IF you want to support unicode in an application, you are better off making your application use wide characters inside (wchar_t), and make all external interfaces use UTF-8 (e.g. file input/output). Using UTF-8 inside an application gets tricky, as you cannot use strlen to count the characters, for example. Kind regards, Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.linuxtv.org/pipermail/vdr/attachments/20050721/db4bf572/attachment-0001.pgp