Re: [syzbot] [f2fs?] WARNING in rcu_sync_dtor

Jan Kara <jack@xxxxxxx> · Mon, 29 Jul 2024 15:27:21 +0200

On Mon 29-07-24 11:10:09, Christian Brauner wrote:
> On Fri, Jul 26, 2024 at 08:23:02AM GMT, syzbot wrote:
> > syzbot has bisected this issue to:
> > 
> > commit b62e71be2110d8b52bf5faf3c3ed7ca1a0c113a5
> > Author: Chao Yu <chao@xxxxxxxxxx>
> > Date:   Sun Apr 23 15:49:15 2023 +0000
> > 
> >     f2fs: support errors=remount-ro|continue|panic mountoption
> > 
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=119745f1980000
> > start commit:   1722389b0d86 Merge tag 'net-6.11-rc1' of git://git.kernel...
> > git tree:       upstream
> > final oops:     https://syzkaller.appspot.com/x/report.txt?x=139745f1980000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=159745f1980000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b698a1b2fcd7ef5f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1237a1f1980000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=115edac9980000
> > 
> > Reported-by: syzbot+20d7e439f76bbbd863a7@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Fixes: b62e71be2110 ("f2fs: support errors=remount-ro|continue|panic mountoption")
> > 
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 
> Thanks to Paul and Oleg for point me in the right direction and
> explaining that rcu sync warning.
> 
> That patch here is remounting a superblock read-only directly by raising
> SB_RDONLY without the involvement of the VFS at all. That's pretty
> broken and is likely to cause trouble if done wrong. The rough order of
> operations to transition rw->ro usualy include checking that the
> filsystem is unfrozen, and marking all mounts read-only, then calling
> into the filesystem so it can do whatever it wants to do.

Yeah, this way of handling filesystem errors dates back to days when the
world was much simpler :) It has been always a bit of a hack (but when you
try to limit damage from corrupted on-disk data structures, a bit of
hackiness is acceptable) but it is doubly so these days.

> In any case, all of this requires holding sb->s_umount. Not holding
> sb->s_umount will end up confusing freeze_super() (Thanks to Oleg for
> noticing!). When freeze_super() is called on a non-ro filesystem it will
> acquire
> percpu_down_write(SB_FREEZE_WRITE+SB_FREEZE_PAGEFAULT+SB_FREEZE_FS) and
> thaw_super() needs to call
> sb_freeze_unlock(SB_FREEZE_FS+SB_FREEZE_PAGEFAULT+SB_FREEZE_WRITE) but
> because you just raise SB_RDONLY you end up causing thaw_super() to skip
> that step causing the bug in rcu_sync_dtor() to be noticed.

Yeah, good spotting.

> Btw, ext4 has similar logic where it raises SB_RDONLY without checking
> whether the filesystem is frozen.
> 
> So I guess, this is technically ok as long as that emergency SB_RDONLY raising
> in sb->s_flags is not done while the fs is already frozen. I think ext4 can
> probably never do that. Jan?

You'd wish (or maybe I'd wish ;) No, ext4 can hit it in the same way f2fs
can. All it takes is for ext4 to hit some metadata corruption on read from
disk while the filesystem is frozen.

> My guess is that something in f2fs can end up raising SB_RDONLY after
> the filesystem is frozen and so it causes this bug. I suspect this is coming
> from the gc_thread() which might issue a f2fs_stop_checkpoint() while the fs is
> already about to be frozen but before the gc thread is stopped as part of the
> freeze.

So in ext4 we have EXT4_FLAGS_SHUTDOWN flag which we now use internally
instead of SB_RDONLY flag for checking whether the filesystem was shutdown
(because otherwise races between remount and hitting fs error were really
messy). However we still *also* set SB_RDONLY so that VFS bails early from
some paths which generally results in less error noise in kernel logs and
also out of caution of not breaking something in this path. That being said
we also support EXT4_IOC_SHUTDOWN ioctl for several years and in that path
we set EXT4_FLAGS_SHUTDOWN without setting SB_RDONLY and nothing seems to
have blown up. So I'm inclined to belive we could remove setting of
SB_RDONLY from ext4 error handling. Ted, what do you think?

Also as the "filesystem shutdown" is spreading across multiple filesystems,
I'm playing with the idea that maybe we could lift a flag like this to VFS
so that we can check it in VFS paths and abort some operations early.  But
so far I'm not convinced the gain is worth the need to iron out various
subtle semantical differences of "shutdown" among filesystems.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR