vdr-1.3.27 and UTF-8

taferner at kde.org (Stefan Taferner) · Thu Jul 21 07:16:30 2005

On Wednesday 20 July 2005 17:52, Sergei Haller wrote:
> On Wed, 20 Jul 2005, Ludwig Nussel (LN) wrote:
> > Klaus Schmidinger wrote:
> > > [...]
> > > To me, a character is an entity that's always the same size (preferably
> > > one byte). UTF-8 breaks with this, so if you have a string that has,
> > > e.g. a strlen() of 10, you can't be sure that this will be really 10
> > > printing
> > > characters because there might be some "escaped" characters.
>
> I think the confusion comes from the assumption that a character is
> exactly one byte long.
>
> strlen counts bytes not characters.
>
> in utf-8 a character can be up to 4 (or was it 8) bytes long.

Correct. The "ascii 7 bit" is one byte, everything else needs escape
characters, e.g. German umlauts are 2 bytes each.

> IIRC, there are new functions to count characters (wstrlen, wstrcmp,
> etc.)

Wrong. This is for wide characters, where every character uses
2 or 4 bytes.

In fact IF you want to support unicode in an application, you are
better off making your application use wide characters inside
(wchar_t), and make all external interfaces use UTF-8 (e.g. file
input/output).

Using UTF-8 inside an application gets tricky, as you cannot
use strlen to count the characters, for example.

Kind regards,
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.linuxtv.org/pipermail/vdr/attachments/20050721/db4bf572/attachment-0001.pgp