On Sat, Aug 12, 2023 at 07:06:47PM -0400, Theodore Ts'o wrote: > On Fri, Aug 11, 2023 at 06:59:15PM -0700, Eric Biggers wrote: > > > > To be honest I've always been confused about why the ->s_encoding check is > > there. It looks like Ted added it in 6456ca6520ab ("ext4: fix kernel oops > > caused by spurious casefold flag") to address a fuzzing report for a filesystem > > that had a casefolded directory but didn't have the casefold feature flag set. > > It seems like an unnecessarily complex fix, though. The filesystem should just > > reject the inode earlier, in __ext4_iget(). And likewise for f2fs. Then no > > other code has to worry about this problem. > > the casefold flag can get set *after* the inode has been fetched, but before > you try to use it. This can happen because syzbot has opened the block device > for writing, and edits the superblock while it is mounted. I don't see how that is relevant here. I think the actual problem you're hinting at is that checking the casefold feature after the filesystem has been mounted is not guaranteed to work properly, as ->s_encoding will be NULL if the casefold feature was not present at mount time. If we'd like to be robust in the event of the casefold feature being concurrently enabled by a write to the block device, then all we need to do is avoid checking the casefold feature after mount time, and instead check ->s_encoding. I believe __ext4_iget() is still the only place it's needed. > One could say that this is an insane threat model, but the syzbot team > thinks that this can be used to break out of a kernel lockdown after a > UEFI secure boot. Which is fine, except I don't think I've been able > to get any company (including Google) to pay for headcount to fix > problems like this, and the unremitting stream of these sorts of > syzbot reports have already caused one major file system developer to > burn out and step down. > > So problems like this get fixed on my own time, and when I have some > free time. And if we "simplify" the code, it will inevitably cause > more syzbot reports, which I will then have to ignore, and the syzbot > team will write more "kernel security disaster" slide deck > presentations to senior VP's, although I'll note this has never > resulted in my getting any additional SWE's to help me fix the > problem... > > > So just __ext4_iget() needs to be fixed. I think we should consider doing that > > before further entrenching all the extra ->s_encoding checks. > > If we can get an upstream kernel consensus that syzbot reports caused > by writing to a mounted file system aren't important, and we can > publish this somewhere where hopefully the syzbot team will pay > attention to it, sure... But, more generally, I think it's clear that concurrent writes to the block device's page cache is not something that filesystems can be robust against. I think this needs to be solved by providing an option to forbid this, as Jan Kara's patchset "block: Add config option to not allow writing to mounted devices" does, and then transitioning legacy use cases to new APIs. Yes, "transitioning legacy use cases" will be a lot of work. And if The Linux Filesystem Maintainers(TM) do not have time for it, that's the way it is. Someone who cares about it (such as someone who actually cares about the potential impact on the Lockdown feature) will need to come along and do it. But I think that should be the plan, and The Linux Filesystem Maintainers(TM) do not need to try to play whack-a-mole with "fixing" filesystem code to be consistently revalidating already-validated cached metadata. - Eric