From: Al Viro > Sent: 20 January 2020 16:12 > > From: Pali Rohár > > > Sent: 20 January 2020 15:20 > > ... > > > This is not possible. There is 1:1 mapping between UTF-8 sequence and > > > Unicode code point. wchar_t in kernel represent either one Unicode code > > > point (limited up to U+FFFF in NLS framework functions) or 2bytes in > > > UTF-16 sequence (only in utf8s_to_utf16s() and utf16s_to_utf8s() > > > functions). > > > > Unfortunately there is neither a 1:1 mapping of all possible byte sequences > > to wchar_t (or unicode code points), nor a 1:1 mapping of all possible > > wchar_t values to UTF-8. > > Really both need to be defined - even for otherwise 'invalid' sequences. > > Who. Cares? > > Filename is a sequence of octets, not codepoints. Its interpretation is > entirely up to the userland. For filesystems that really ought to be true. Saves a lot of problems in the kernel. I guess the fat driver has to do something to convert the UCS-16 on-disk filenames to/from a sequence of octets. Even Microsoft have made it much easier to have case-dependant NTS4 filesystems in windows 10. (Ever watched the number of different cases in the list of c:/windows/system32/drivers/*.sys filenames output when windows boots? They are nearly all different!) - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)