On Sat 04-11-23 15:00:13, Christian Brauner wrote: > Before [1] freezing a filesystems through the block layer only worked > for the main block device as the owning superblock of additional block > devices could not be found. Any filesystem that made use of multiple > block devices would only be freezable via it's main block device. > > For example, consider xfs over device mapper with /dev/dm-0 as main > block device and /dev/dm-1 as external log device. Two freeze requests > before [1]: > > (1) dmsetup suspend /dev/dm-0 on the main block device > > bdev_freeze(dm-0) > -> dm-0->bd_fsfreeze_count++ > -> freeze_super(xfs-sb) > > The owning superblock is found and the filesystem gets frozen. > Returns 0. > > (2) dmsetup suspend /dev/dm-1 on the log device > > bdev_freeze(dm-1) > -> dm-1->bd_fsfreeze_count++ > > The owning superblock isn't found and only the block device freeze > count is incremented. Returns 0. > > Two freeze requests after [1]: > > (1') dmsetup suspend /dev/dm-0 on the main block device > > bdev_freeze(dm-0) > -> dm-0->bd_fsfreeze_count++ > -> freeze_super(xfs-sb) > > The owning superblock is found and the filesystem gets frozen. > Returns 0. > > (2') dmsetup suspend /dev/dm-1 on the log device > > bdev_freeze(dm-0) > -> dm-0->bd_fsfreeze_count++ > -> freeze_super(xfs-sb) > > The owning superblock is found and the filesystem gets frozen. > Returns -EBUSY. > > When (2') is called we initiate a freeze from another block device of > the same superblock. So we increment the bd_fsfreeze_count for that > additional block device. But we now also find the owning superblock for > additional block devices and call freeze_super() again which reports > -EBUSY. > > This can be reproduced through xfstests via: > > mkfs.xfs -f -m crc=1,reflink=1,rmapbt=1, -i sparse=1 -lsize=1g,logdev=/dev/nvme1n1p4 /dev/nvme1n1p3 > mkfs.xfs -f -m crc=1,reflink=1,rmapbt=1, -i sparse=1 -lsize=1g,logdev=/dev/nvme1n1p6 /dev/nvme1n1p5 > > FSTYP=xfs > export TEST_DEV=/dev/nvme1n1p3 > export TEST_DIR=/mnt/test > export TEST_LOGDEV=/dev/nvme1n1p4 > export SCRATCH_DEV=/dev/nvme1n1p5 > export SCRATCH_MNT=/mnt/scratch > export SCRATCH_LOGDEV=/dev/nvme1n1p6 > export USE_EXTERNAL=yes > > sudo ./check generic/311 > > Current semantics allow two concurrent freezers: one initiated from > userspace via FREEZE_HOLDER_USERSPACE and one initiated from the kernel > via FREEZE_HOLDER_KERNEL. If there are multiple concurrent freeze > requests from either FREEZE_HOLDER_USERSPACE or FREEZE_HOLDER_KERNEL > -EBUSY is returned. > > We need to preserve these semantics because as they are uapi via > FIFREEZE and FITHAW ioctl()s. IOW, freezes don't nest for FIFREEZE and > FITHAW. Other kernels consumers rely on non-nesting freezes as well. > > With freezes initiated from the block layer freezes need to nest if the > same superblock is frozen via multiple devices. So we need to start > counting the number of freeze requests. > > If FREEZE_HOLDER_BDEV is passed alongside FREEZE_HOLDER_KERNEL or > FREEZE_HOLDER_USERSPACE we allow the caller to nest freeze calls. FREEZE_HOLDER_BDEV should be FREEZE_MAY_NEST I guess. > To accommodate the old semantics we split the freeze counter into two > counting kernel initiated and userspace initiated freezes separately. We > can then also stop recording FREEZE_HOLDER_* in struct sb_writers. > > We also simplify freezing by making all concurrent freezers share a > single active superblock reference count instead of having separate > references for kernel and userspace. I don't see why we would need two > active reference counts. Neither FREEZE_HOLDER_KERNEL nor > FREEZE_HOLDER_USERSPACE can put the active reference as long as they are > concurrent freezers anwyay. That was already true before we allowed > nesting freezes. > > Survives various fstests runs with different options including the > reproducer, online scrub, and online repair, fsfreze, and so on. Also > survives blktests. > > Reported-by: Chandan Babu R <chandanbabu@xxxxxxxxxx> > Link: https://lore.kernel.org/linux-block/87bkccnwxc.fsf@debian-BULLSEYE-live-builder-AMD64 > Fixes: [1]: bfac4176f2c4 ("bdev: implement freeze and thaw holder operations") # no backport needed > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> Just one more typo fix below. Feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> > @@ -1930,6 +2008,13 @@ static int wait_for_partially_frozen(struct super_block *sb) > * userspace can both hold a filesystem frozen. The filesystem remains frozen > * until there are no kernel or userspace freezes in effect. > * > + * A filesystem may hold multiple devices and thus a filesystems may be > + * frozen through the block layer via multiple block devices. In this > + * case the request is marked as being allowed to nest passig ^^ passing Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR