On Wed, Jun 07, 2023 at 11:22:43AM +0200, Jan Kara wrote: > On Tue 06-06-23 10:19:56, Darrick J. Wong wrote: > > On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote: > > > On Mon 22-05-23 16:42:00, Darrick J. Wong wrote: > > > > How about this as an alternative patch? Kernel and userspace freeze > > > > state are stored in s_writers; each type cannot block the other (though > > > > you still can't have nested kernel or userspace freezes); and the freeze > > > > is maintained until /both/ freeze types are dropped. > > > > > > > > AFAICT this should work for the two other usecases (quiescing pagefaults > > > > for fsdax pmem pre-removal; and freezing fses during suspend) besides > > > > online fsck for xfs. > > > > > > > > --D > > > > > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze > > > > > > > > Userspace can freeze a filesystem using the FIFREEZE ioctl or by > > > > suspending the block device; this state persists until userspace thaws > > > > the filesystem with the FITHAW ioctl or resuming the block device. > > > > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for > > > > the fsfreeze ioctl") we only allow the first freeze command to succeed. > > > > > > > > The kernel may decide that it is necessary to freeze a filesystem for > > > > its own internal purposes, such as suspends in progress, filesystem fsck > > > > activities, or quiescing a device prior to removal. Userspace thaw > > > > commands must never break a kernel freeze, and kernel thaw commands > > > > shouldn't undo userspace's freeze command. > > > > > > > > Introduce a couple of freeze holder flags and wire it into the > > > > sb_writers state. One kernel and one userspace freeze are allowed to > > > > coexist at the same time; the filesystem will not thaw until both are > > > > lifted. > > > > > > > > Inspired-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > Yes, this is exactly how I'd imagine it. Thanks for writing the patch! > > > > > > I'd just note that this would need rebasing on top of Luis' patches 1 and > > > 2. Also: > > > > I started doing that, but I noticed that after patch 1, freeze_super no > > longer leaves s_active elevated if the freeze is successful. The > > callers drop the s_active ref that they themselves obtained, which > > means that we've now changed that behavior, right? ioctl_fsfreeze now > > does: > > > > if (!get_active_super(sb->s_bdev)) > > return -ENOTTY; > > > > (Increase ref) > > > > /* Freeze */ > > if (sb->s_op->freeze_super) > > ret = sb->s_op->freeze_super(sb); > > ret = freeze_super(sb); > > > > (Not sure why we can do both here?) > > > > deactivate_locked_super(sb); > > > > (Decrease ref; net change to s_active is zero) > > > > return ret; > > > > Luis hasn't responded to my question, so I stopped. > > Right. I kind of like how he's moved the locking out of freeze_super() / > thaw_super() but I agree this semantic change is problematic and needs much > more thought - e.g. with Luis' version the fs could be unmounted while > frozen which is going to spectacularly deadlock. So yeah let's just ignore > patch 1 for now. Agreed, I like moving the locking out of freeze_super too. I'm less enthused about patch 2's helpers since there are those intermediate freezer states whose existence are only hinted at if you get to the point of asking yourself "Why would there be both _is_frozen and _is_unfrozen predicates?". > Longer term I believe if the sb is frozen by userspace, we should refuse to > unmount it but that's a separate discussion for some other time. > > > > BTW, when reading this code, I've spotted attached cleanup opportunity but > > > I'll queue that separately so that is JFYI. > > > > > > > +#define FREEZE_HOLDER_USERSPACE (1U << 1) /* userspace froze fs */ > > > > +#define FREEZE_HOLDER_KERNEL (1U << 2) /* kernel froze fs */ > > > > > > Why not start from 1U << 0? And bonus points for using BIT() macro :). > > > > I didn't think filesystem code was supposed to be using stuff from > > vdso.h... > > Hum, so BIT() macro is quite widely used in include/linux/ (from generic > headers e.g. in trace.h). include/linux/bits.h seems to be the right > include to use but I'm pretty sure include/linux/fs.h already gets this > header through something. Ok, will do. --D > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR