On 11/25/24 08:30, Theodore Ts'o wrote:
On Sun, Nov 24, 2024 at 08:10:09PM +0000, Al Viro wrote:
What happens there is that on a badly corrupt image we have an on-disk
inode with link count below the actual number of links. And after
unlinks remove enough of those to drive the link count to 0, inode
is freed. After that point, all remaining links are pointing to a freed
on-disk inode, which is discovered when they need to decrement of link
count that is already 0. Which does deserve a warning, probably without
a stack trace.
There's nothing the kernel can do about that, short of scanning the entire
filesystem at mount time and verifying that link counts are accurate...
Theoretically we could check if there's an associated dentry at the time of
decrement-to-0 and refuse to do that decrement in such case, marking the
in-core inode so that no extra dentries would be associated with it
from that point on. Not sure if that'd make for a good mitigation strategy,
though - and it wouldn't help in case of extra links we hadn't seen by
that point; they would become dangling pointers and reuse of on-disk inode
would still be possible...
Yeah, what we do with ext4 in that case is that we mark the file
system as corrupted, and print an ext4_error() message, but we don't
call WARN_ON. At this point, you cam either (a) force a reboot, so
that it can get fixed up at fsck time --- this might be appropriate if
you have a failover setup, where bringing the system *down* so the
backup system can do its thing without further corrupting user data,
(b) remount the file system read-only, so that you don't actually do
any further damage to the system, or (c) if the file system is marked
"don't worry, be happy, continue running because some silly security
policy says that bringing the system down is a denial of service
attack and we can't have that (**sigh**), it might be a good idea to
mark the block group as "corrupted" and refuse to do any further block
or inode allocations out of that block group until the file system can
be properly checked.
Anyway, this is why I now ignore any syzkaller report that involves a
badly corrupted file system being mounted. That's not something I
consider a valid threat model, and if someone wants to pay an engineer
to work through all of those issues, *great*, but I don't have the
time to deal with what I consider a super-low-priority issue.
- Ted
Thank you for the insight, Ted. I understand the challenges of
addressing issues caused by badly corrupted filesystems, especially when
they fall outside typical threat models. I appreciate your perspective
and time!