On Wed, Sep 06, 2023 at 03:26:21PM +0200, Mikulas Patocka wrote: > lvm may suspend any logical volume anytime. If lvm suspend races with > unmount, it may be possible that the kernel writes to the filesystem after > unmount successfully returned. The problem can be demonstrated with this > script: > > #!/bin/sh -ex > modprobe brd rd_size=4194304 > vgcreate vg /dev/ram0 > lvcreate -L 16M -n lv vg > mkfs.ext4 /dev/vg/lv > mount -t ext4 /dev/vg/lv /mnt/test > dmsetup suspend /dev/vg/lv > (sleep 1; dmsetup resume /dev/vg/lv) & > umount /mnt/test > md5sum /dev/vg/lv > md5sum /dev/vg/lv > dmsetup remove_all > rmmod brd > > The script unmounts the filesystem and runs md5sum twice, the result is > that these two invocations return different hash. > > What happens: > * dmsetup suspend calls freeze_bdev, that goes to freeze_super and it > increments sb->s_active > * then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls > deactivate_super, deactivate_super sees that sb->s_active is 2, so it > decreases it to 1 and does nothing - the umount syscall returns > successfully > * now we have a mounted filesystem despite the fact that umount returned > * we call md5sum, this waits for the block device being unblocked > * dmsetup resume unblocks the block device and calls thaw_bdev, that calls > thaw_super and thaw_super_locked > * thaw_super_locked calls deactivate_locked_super, this actually drops the > refcount and performs the unmount. The unmount races with md5sum. md5sum > wins the race and it returns the hash of the filesystem before it was > unmounted > * the second md5sum returns the hash after the filesystem was unmounted > > In order to fix this bug, this patch introduces a new function > wait_and_deactivate_super that will wait if the filesystem is frozen and > perform deactivate_locked_super only after that. > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > > --- > fs/namespace.c | 2 +- > fs/super.c | 20 ++++++++++++++++++++ > include/linux/fs.h | 1 + > 3 files changed, 22 insertions(+), 1 deletion(-) > > Index: linux-2.6/fs/namespace.c > =================================================================== > --- linux-2.6.orig/fs/namespace.c 2023-09-06 09:45:54.000000000 +0200 > +++ linux-2.6/fs/namespace.c 2023-09-06 09:47:15.000000000 +0200 > @@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn > } > fsnotify_vfsmount_delete(&mnt->mnt); > dput(mnt->mnt.mnt_root); > - deactivate_super(mnt->mnt.mnt_sb); > + wait_and_deactivate_super(mnt->mnt.mnt_sb); > mnt_free_id(mnt); > call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt); > } > Index: linux-2.6/fs/super.c > =================================================================== > --- linux-2.6.orig/fs/super.c 2023-09-05 21:09:16.000000000 +0200 > +++ linux-2.6/fs/super.c 2023-09-06 09:52:20.000000000 +0200 > @@ -36,6 +36,7 @@ > #include <linux/lockdep.h> > #include <linux/user_namespace.h> > #include <linux/fs_context.h> > +#include <linux/delay.h> > #include <uapi/linux/mount.h> > #include "internal.h" > > @@ -365,6 +366,25 @@ void deactivate_super(struct super_block > EXPORT_SYMBOL(deactivate_super); > > /** > + * wait_and_deactivate_super - wait for unfreeze and drop a reference > + * @s: superblock to deactivate > + * > + * Variant of deactivate_super(), except that we wait if the filesystem is > + * frozen. This is required on unmount, to make sure that the filesystem is > + * really unmounted when this function returns. > + */ > +void wait_and_deactivate_super(struct super_block *s) > +{ > + down_write(&s->s_umount); > + while (s->s_writers.frozen != SB_UNFROZEN) { > + up_write(&s->s_umount); > + msleep(1); > + down_write(&s->s_umount); > + } Instead of msleep, could you use wait_var_event_killable like wait_for_partially_frozen() does? --D > + deactivate_locked_super(s); > +} > + > +/** > * grab_super - acquire an active reference > * @s: reference we are trying to make active > * > Index: linux-2.6/include/linux/fs.h > =================================================================== > --- linux-2.6.orig/include/linux/fs.h 2023-09-05 21:09:16.000000000 +0200 > +++ linux-2.6/include/linux/fs.h 2023-09-06 09:46:56.000000000 +0200 > @@ -2262,6 +2262,7 @@ void kill_anon_super(struct super_block > void kill_litter_super(struct super_block *sb); > void deactivate_super(struct super_block *sb); > void deactivate_locked_super(struct super_block *sb); > +void wait_and_deactivate_super(struct super_block *sb); > int set_anon_super(struct super_block *s, void *data); > int set_anon_super_fc(struct super_block *s, struct fs_context *fc); > int get_anon_bdev(dev_t *); >