On Wed, 6 Sep 2023, Christian Brauner wrote: > > What happens: > > * dmsetup suspend calls freeze_bdev, that goes to freeze_super and it > > increments sb->s_active > > * then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls > > deactivate_super, deactivate_super sees that sb->s_active is 2, so it > > decreases it to 1 and does nothing - the umount syscall returns > > successfully > > * now we have a mounted filesystem despite the fact that umount returned > > That can happen for any number of reasons. Other codepaths might very > well still hold active references to the superblock. The same thing can > happen when you have your filesystem pinned in another mount namespace. > > If you really want to be absolutely sure that the superblock is > destroyed you must use a mechanism like fanotify which allows you to get > notified on superblock destruction. If the administrator runs a script that performs unmount and then back-up of the underlying block device, it may read corrupted data. I think that this is a problem. > > @@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn > > } > > fsnotify_vfsmount_delete(&mnt->mnt); > > dput(mnt->mnt.mnt_root); > > - deactivate_super(mnt->mnt.mnt_sb); > > + wait_and_deactivate_super(mnt->mnt.mnt_sb); > > Your patch means that we hang on any umount when the filesystem is > frozen. Currently, if we freeze a filesystem with "fsfreeze" and unmount it, the mount point is removed, but the filesystem stays active and it is leaked. You can't unfreeze it with "fsfreeze --unfreeze" because the mount point is gone. (the only way how to recover it is "echo j>/proc/sysrq-trigger"). > IOW, you'd also hang on any umount of a bind-mount. IOW, every > single container making use of this filesystems via bind-mounts would > hang on umount and shutdown. bind-mount doesn't modify "s->s_writers.frozen", so the patch does nothing in this case. I tried unmounting bind-mounts and there was no deadlock. > You'd effectively build a deadlock trap for userspace when the > filesystem is frozen. And nothing can make progress until that thing is > thawed. Umount can't block if the block device is frozen. unmounting a filesystem frozen with "fsfreeze" doesn't work in the current kernel. We can say that the administrator shouldn't do it. But reading the block device after umount finishes is something that the administrator may do. BTW. what do you think that unmount of a frozen filesystem should properly do? Fail with -EBUSY? Or, unfreeze the filesystem and unmount it? Or something else? > That msleep(1) alone is a pretty nasty hack. We should definitely not > spin in code like this. That superblock could stay frozen for a long > time without s_umount held. So this is spinning. > > Even if we wanted to do this it would need to use a similar wait > mechanism for the filesystem to be thawed like we do in > thaw_super_locked(). Yes, it may be possible to rework it using a wait queue. Mikulas