RE: vfat: Broken case-insensitive support for UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Pali Rohár
> Sent: 20 January 2020 15:20
...
> This is not possible. There is 1:1 mapping between UTF-8 sequence and
> Unicode code point. wchar_t in kernel represent either one Unicode code
> point (limited up to U+FFFF in NLS framework functions) or 2bytes in
> UTF-16 sequence (only in utf8s_to_utf16s() and utf16s_to_utf8s()
> functions).

Unfortunately there is neither a 1:1 mapping of all possible byte sequences
to wchar_t (or unicode code points), nor a 1:1 mapping of all possible
wchar_t values to UTF-8.
Really both need to be defined - even for otherwise 'invalid' sequences.

Even the 16-bit values above 0xd000 can appear on their own in
windows filesystems (according to wikipedia).

It is all to easy to get sequences of values that cannot be converted
to/from UTF-8.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux