Re: [PATCH] libfs: Attempt exact-match comparison first during casefold lookup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 17, 2024 at 04:40:17PM -0800, Linus Torvalds wrote:
> Note that the whole "malformed utf-8 is an error" is actually wrong anyway.
> 
> Yes, if you *output* utf-8, and your output is malformed, then that's
> an error that needs fixing.
> 
> But honestly, "malformed utf-8" on input is almost always just "oh, it
> wasn't utf-8 to begin with, and somebody is still using Latin-1 or
> Shift-JIS or whatever".
> 
> And then treating that as some kind of hard error is actually really
> really wrong and annoying, and may end up meaning that the user cannot
> *fix* it, because they can't access the data at all.

A file system which supports casefolding can support "strict" mode
(not the default) where attempts to create files that have invalid
UTF-8 characters are rejected before a file or hard link is created
(or renamed) with an error.

This is what MacOS does, by the way.  If you try to rsync a file from
a Linux box where the file was created by unpacking a Windows Zip file
created by downloading a directory hierarchy from a Microsoft
Sharepoint, and then you try to scp or rsync it over to MacOS, MacOS
will will refuse to allow the file to be created if it contains
invalid UTF-8 characters, and rsync or scp will report an error.  I
just ran into this earlier today...

So we don't need to worry about the user not being able to fix it,
because they won't have been able to create the file in the first
place.  This is not the default, since we know there are a bunch of
users who might be creating files using the unofficial "Klingon"
characters (for example) that are not officially part of Unicode since
Unicode will only allow characters used by human languages, and
Klingon doesn't qualify.  I believe though that Android has elected to
enable casefolding in strict mode, which is fine as far as I'm concerned.

> I find libraries that just error out on "malformed utf-8" to be
> actively harmful.

I admit that when I discovered that MacOS errored out on illegal utf-8
characters it was mildly annoying, but it wasn't that hard to fix it
on the Linux side and then I retried the rsync.  It also turned out
that if I unpacked the zip file on MacOS, the filename was created
without the illegal utf-8 characters, so there may have been something
funky going on with the zip userspace program on Linux.  I haven't
cared enough to try to debug it...

       		      	     	   	    - Ted




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux