Re: [PATCH] fix writing to the filesystem after unmount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 06, 2023 at 08:22:45AM -0700, Darrick J. Wong wrote:
> On Wed, Sep 06, 2023 at 03:26:21PM +0200, Mikulas Patocka wrote:
> > lvm may suspend any logical volume anytime. If lvm suspend races with
> > unmount, it may be possible that the kernel writes to the filesystem after
> > unmount successfully returned. The problem can be demonstrated with this
> > script:
> > 
> > #!/bin/sh -ex
> > modprobe brd rd_size=4194304
> > vgcreate vg /dev/ram0
> > lvcreate -L 16M -n lv vg
> > mkfs.ext4 /dev/vg/lv
> > mount -t ext4 /dev/vg/lv /mnt/test
> > dmsetup suspend /dev/vg/lv
> > (sleep 1; dmsetup resume /dev/vg/lv) &
> > umount /mnt/test
> > md5sum /dev/vg/lv
> > md5sum /dev/vg/lv
> > dmsetup remove_all
> > rmmod brd
> > 
> > The script unmounts the filesystem and runs md5sum twice, the result is
> > that these two invocations return different hash.
> > 
> > What happens:
> > * dmsetup suspend calls freeze_bdev, that goes to freeze_super and it
> >   increments sb->s_active
> > * then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls
> >   deactivate_super, deactivate_super sees that sb->s_active is 2, so it
> >   decreases it to 1 and does nothing - the umount syscall returns
> >   successfully
> > * now we have a mounted filesystem despite the fact that umount returned
> > * we call md5sum, this waits for the block device being unblocked
> > * dmsetup resume unblocks the block device and calls thaw_bdev, that calls
> >   thaw_super and thaw_super_locked
> > * thaw_super_locked calls deactivate_locked_super, this actually drops the
> >   refcount and performs the unmount. The unmount races with md5sum. md5sum
> >   wins the race and it returns the hash of the filesystem before it was
> >   unmounted
> > * the second md5sum returns the hash after the filesystem was unmounted
> > 
> > In order to fix this bug, this patch introduces a new function
> > wait_and_deactivate_super that will wait if the filesystem is frozen and
> > perform deactivate_locked_super only after that.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
> > 
> > ---
> >  fs/namespace.c     |    2 +-
> >  fs/super.c         |   20 ++++++++++++++++++++
> >  include/linux/fs.h |    1 +
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/fs/namespace.c
> > ===================================================================
> > --- linux-2.6.orig/fs/namespace.c	2023-09-06 09:45:54.000000000 +0200
> > +++ linux-2.6/fs/namespace.c	2023-09-06 09:47:15.000000000 +0200
> > @@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn
> >  	}
> >  	fsnotify_vfsmount_delete(&mnt->mnt);
> >  	dput(mnt->mnt.mnt_root);
> > -	deactivate_super(mnt->mnt.mnt_sb);
> > +	wait_and_deactivate_super(mnt->mnt.mnt_sb);
> >  	mnt_free_id(mnt);
> >  	call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
> >  }
> > Index: linux-2.6/fs/super.c
> > ===================================================================
> > --- linux-2.6.orig/fs/super.c	2023-09-05 21:09:16.000000000 +0200
> > +++ linux-2.6/fs/super.c	2023-09-06 09:52:20.000000000 +0200
> > @@ -36,6 +36,7 @@
> >  #include <linux/lockdep.h>
> >  #include <linux/user_namespace.h>
> >  #include <linux/fs_context.h>
> > +#include <linux/delay.h>
> >  #include <uapi/linux/mount.h>
> >  #include "internal.h"
> >  
> > @@ -365,6 +366,25 @@ void deactivate_super(struct super_block
> >  EXPORT_SYMBOL(deactivate_super);
> >  
> >  /**
> > + *	wait_and_deactivate_super	-	wait for unfreeze and drop a reference
> > + *	@s: superblock to deactivate
> > + *
> > + *	Variant of deactivate_super(), except that we wait if the filesystem is
> > + *	frozen. This is required on unmount, to make sure that the filesystem is
> > + *	really unmounted when this function returns.
> > + */
> > +void wait_and_deactivate_super(struct super_block *s)
> > +{
> > +	down_write(&s->s_umount);
> > +	while (s->s_writers.frozen != SB_UNFROZEN) {
> > +		up_write(&s->s_umount);
> > +		msleep(1);
> > +		down_write(&s->s_umount);
> > +	}
> 
> Instead of msleep, could you use wait_var_event_killable like
> wait_for_partially_frozen() does?

I said the same thing but I think that the patch in this way isn't a
good idea and technically also uapi breakage. Anyway, can you take a
look at my third response here?

https://lore.kernel.org/lkml/20230906-aufheben-hagel-9925501b7822@brauner

(I forgot you worked on freezing as well.
I'm currently moving the freezing bits to fs_holder_ops
https://gitlab.com/brauner/linux/-/commits/vfs.super.freeze)



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux