On Thu, May 04, 2023 at 09:59:52AM -0600, Keith Busch wrote: > On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote: > > Hello Guys, > > > > I got one report in which buffered write IO hangs in balance_dirty_pages, > > after one nvme block device is unplugged physically, then umount can't > > succeed. > > > > Turns out it is one long-term issue, and it can be triggered at least > > since v5.14 until the latest v6.3. > > > > And the issue can be reproduced reliably in KVM guest: > > > > 1) run the following script inside guest: > > > > mkfs.ext4 -F /dev/nvme0n1 > > mount /dev/nvme0n1 /mnt > > dd if=/dev/zero of=/mnt/z.img& > > sleep 10 > > echo 1 > /sys/block/nvme0n1/device/device/remove > > > > 2) dd hang is observed and /dev/nvme0n1 is gone actually > > Sorry to jump in so late. > > For an ungraceful nvme removal, like a surpirse hot unplug, the driver > sets the capacity to 0 and that effectively ends all dirty page writers > that could stall forward progress on the removal. And that 0 capacity > should also cause 'dd' to exit. Actually nvme device has been gone, and the hang just happens in balance_dirty_pages() from generic_perform_write(). The issue should be triggered on all kinds of disks which can be hot-unplug, and it can be duplicated on both ublk and nvme easily. > > But this is not an ungraceful removal, so we're not getting that forced > behavior. Could we use the same capacity trick here after flushing any > outstanding dirty pages? set_capacity(0) has been called in del_gendisk() after fsync_bdev() & __invalidate_device(), but I understand FS code just try best to flush dirty pages. And when the bdev is gone, these un-flushed dirty pages need cleanup, otherwise they can't be used any more. Thanks, Ming