On Monday 06 January 2020 14:46:33 Gabriel Krisman Bertazi wrote: > Pali Rohár <pali.rohar@xxxxxxxxx> writes: > > > What do you think what should kernel's exfat driver do in this case? > > > > To prevent such thing we need to use some kind of Unicode normalization > > form here. > > > > CCing Gabriel as he was implementing some Unicode normalization for ext4 > > driver and maybe should bring some light to new exfat driver too. > > We have an in-kernel implementation of the canonical decomposition > normalization (NFD) in fs/unicode, which is what we use for f2fs and > ext4. It is heated argument what is the best form for filesystem usage, > and from what I researched, every proprietary filesystem does a > different (and crazy in their unique way) thing. > > For exfat, even though the specification is quite liberal, I think the > reasonable answer is to follow closely whatever behavior the Windows > implementation has, whether it does normalization at all or not. Even if > it is just an in-memory format used internally for lookups, assuming a > different format or treating differently invalid file names can result > in awkward results in a filesystem created on another operating system, > like filename collisions or false misses in lookups. > Hi Gabriel! Thank you for your input. AFAIK Windows exfat implementation does not do any Unicode normalization and allow to store any sequence of 16bit numbers excluding some "bad chars" as filename (so including also unpaired half of UTF-16 surrogate pair) if such upper cased filename (according to upcase table stored in FS) does not conflict with another upper cased filename already stored in directory. So based on your suggestion, I understood that we should not do any Unicode Normalization even just for comparing filenames if it exists. -- Pali Rohár pali.rohar@xxxxxxxxx