On Wed, Sep 06, 2023 at 06:01:06PM +0200, Mikulas Patocka wrote: > > > On Wed, 6 Sep 2023, Christian Brauner wrote: > > > > > IOW, you'd also hang on any umount of a bind-mount. IOW, every > > > > single container making use of this filesystems via bind-mounts would > > > > hang on umount and shutdown. > > > > > > bind-mount doesn't modify "s->s_writers.frozen", so the patch does nothing > > > in this case. I tried unmounting bind-mounts and there was no deadlock. > > > > With your patch what happens if you do the following? > > > > #!/bin/sh -ex > > modprobe brd rd_size=4194304 > > vgcreate vg /dev/ram0 > > lvcreate -L 16M -n lv vg > > mkfs.ext4 /dev/vg/lv > > > > mount -t ext4 /dev/vg/lv /mnt/test > > mount --bind /mnt/test /opt > > mount --make-private /opt > > > > dmsetup suspend /dev/vg/lv > > (sleep 1; dmsetup resume /dev/vg/lv) & > > > > umount /opt # I'd expect this to hang > > > > md5sum /dev/vg/lv > > md5sum /dev/vg/lv > > dmsetup remove_all > > rmmod brd > > "umount /opt" doesn't hang. It waits one second (until dmsetup resume is > called) and then proceeds. So unless I'm really misreading the code - entirely possible - the umount of the bind-mount now waits until the filesystem is resumed with your patch. And if that's the case that's a bug. If at all, then only the last umount, the one that destroys the superblock, should wait for the filesystem to become unfrozen. A bind-mount shouldn't as there are still active mounts of the filesystem (e.g., /mnt/test). So you should see this with (unless I really misread things): #!/bin/sh -ex modprobe brd rd_size=4194304 vgcreate vg /dev/ram0 lvcreate -L 16M -n lv vg mkfs.ext4 /dev/vg/lv mount -t ext4 /dev/vg/lv /mnt/test mount --bind /mnt/test /opt mount --make-private /opt dmsetup suspend /dev/vg/lv umount /opt # This will hang with your patch? > > Then, it fails with "rmmod: ERROR: Module brd is in use" because the > script didn't unmount /mnt/test. > > > > BTW. what do you think that unmount of a frozen filesystem should properly > > > do? Fail with -EBUSY? Or, unfreeze the filesystem and unmount it? Or > > > something else? > > > > In my opinion we should refuse to unmount frozen filesystems and log an > > error that the filesystem is frozen. Waiting forever isn't a good idea > > in my opinion. > > But lvm may freeze filesystems anytime - so we'd get randomly returned > errors then. So? Or you might hang at anytime. > > > But this is a significant uapi change afaict so this would need to be > > hidden behind a config option, a sysctl, or it would have to be a new > > flag to umount2() MNT_UNFROZEN which would allow an administrator to use > > this flag to not unmount a frozen filesystems. > > The kernel currently distinguishes between kernel-initiated freeze (that > is used by the XFS scrub) and userspace-initiated freeze (that is used by > the FIFREEZE ioctl and by device-mapper initiated freeze through > freeze_bdev). Yes, I'm aware. > > Perhaps we could distinguish between FIFREEZE-initiated freezes and > device-mapper initiated freezes as well. And we could change the logic to > return -EBUSY if the freeze was initiated by FIFREEZE and to wait for > unfreeze if it was initiated by the device-mapper. For device mapper initiated freezes you can unfreeze independent of any filesystem mountpoint via dm ioctls.