Re: vfat: Broken case-insensitive support for UTF-8

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Sun, 19 Jan 2020 23:08:09 +0000

On Sun, Jan 19, 2020 at 11:14:55PM +0100, Pali Rohár wrote:

> So when UTF-8 on VFS for VFAT is enabled, then for VFS <--> VFAT
> conversion are used utf16s_to_utf8s() and utf8s_to_utf16s() functions.
> But in fat_name_match(), vfat_hashi() and vfat_cmpi() functions is used
> NLS table (default iso8859-1) with nls_strnicmp() and nls_tolower().
> 
> Which means that fat_name_match(), vfat_hashi() and vfat_cmpi() are
> broken for vfat in UTF-8 mode.
> 
> I was thinking how to fix it, and the only possible way is to write a
> uni_tolower() function which takes one Unicode code point and returns
> lowercase of input's Unicode code point. We cannot do any Unicode
> normalization as VFAT specification does not say anything about it and
> MS reference fastfat.sys implementation does not do it neither.

Then how can that possibly be broken?  If it matches the native behaviour,
that's it.

> As you can see lowercase 'd' and uppercase 'D' are same, but lowercase
> 'č' and uppercase 'Č' are not same. This is because 'č' is two bytes
> 0xc4 0x8d sequence and comparing is done by Latin1 table. 0xc4 is in
> Latin 'Ä' which is already in uppercase. 0x8d is control char so is not
> changed by tolower/toupper function.

Again, who the hell cares?  Does the behaviour match how Windows handles
that thing?  "Case" is not something well-defined; the only definition
is "whatever weird crap does the native implementation choose to do".
That's the only reason to support that garbage at all...