Re: [PATCH] libfs: Attempt exact-match comparison first during casefold lookup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> On Wed, 17 Jan 2024 at 18:06, Theodore Ts'o <tytso@xxxxxxx> wrote:
>> So we don't need to worry about the user not being able to fix it,
>> because they won't have been able to create the file in the first
>> place.
>
> Yeah, that's a fine argument, until you have a bug or subtle bit flip
> data corruption, and now instead of having something you can recover,
> the system actively says "Nope".

I know this is not your point, but I should add that, in case of a
bug or bit flip, we support "fixing" the "bad utf8" string through fsck.

>> I admit that when I discovered that MacOS errored out on illegal utf-8
>> characters it was mildly annoying,
>
> We may have to be able to interoperate with shit, but let's call it what it is.
>
> Nobody pretends FAT is a great filesystem that made great design
> decisions. That doesn't mean that we can't interoperate with it just
> fine.
>
> But we don't need to take those idiotic and bad design decisions to
> heart, and we don't need to hide the fact that they are horrendous
> design mistakes.

There is a correctness issue with accepting the creation of invalid
utf-8 names that justifies the existence of strict mode.  Currently
undefined code-points can become a casefold match to some other file in
a later unicode version. When you decide to update your unicode version
or even copy the file to a volume with a different version, the lookup
might yield a different file, making one of them inaccessible or
overwriting the wrong file.

Obviously, not all corruptions would yield a "valid" undefined
code-point.  But those are possible.

We currently don't care much, since mkfs will create the volume with a
fixed, never-changed unicode version. That is, unless the user goes out
of their way to shoot themselves in the foot.

Strict mode is an easy way to prevent this class of issues (aside from
corruptions).

> So "strict" mode should mean that you can't *create* a misformed UTF-8
> filename.
>
> It's that same "be conservative in what you do".
>
> But *dammit*, if "strict" mode means that you can't even read other
> peoples mistakes because your "->lookup()" function refuses to even
> look at it, then "strict" mode is GARBAGE.
>
> That's the "be liberal in what you accept" part. Do it, or be damned.

Yes, we could be more liberal in the lookup while restricting the
creation of invalid utf8 sequences.  But, the only case where it would
matter is for corrupted volumes, where a file-name suddenly changed to
something invalid.  Considering ext4 and f2fs, since the disk direntry
hash (which is hash(casefolded(filename))) didn't get corrupted exactly
right, looking up the exact-match of the invalid name might fail.

This would create an even more inconsistent semantics, where small,
non-hashed directories can find these files, but larger, hashed
directories might not.  And that is even more confusing to users,
since it exposes internal filesystem details.

I get the point about how annoying the current semantics is.  But I
still think this is the sanest approach to a fundamentally insane
feature.

-- 
Gabriel Krisman Bertazi




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux