On Thu, May 04, 2023 at 09:59:52AM -0600, Keith Busch wrote: > On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote: > > Hello Guys, > > > > I got one report in which buffered write IO hangs in balance_dirty_pages, > > after one nvme block device is unplugged physically, then umount can't > > succeed. > > > > Turns out it is one long-term issue, and it can be triggered at least > > since v5.14 until the latest v6.3. > > > > And the issue can be reproduced reliably in KVM guest: > > > > 1) run the following script inside guest: > > > > mkfs.ext4 -F /dev/nvme0n1 > > mount /dev/nvme0n1 /mnt > > dd if=/dev/zero of=/mnt/z.img& > > sleep 10 > > echo 1 > /sys/block/nvme0n1/device/device/remove > > > > 2) dd hang is observed and /dev/nvme0n1 is gone actually > > Sorry to jump in so late. > > For an ungraceful nvme removal, like a surpirse hot unplug, the driver > sets the capacity to 0 and that effectively ends all dirty page writers > that could stall forward progress on the removal. And that 0 capacity > should also cause 'dd' to exit. > > But this is not an ungraceful removal, so we're not getting that forced > behavior. Could we use the same capacity trick here after flushing any > outstanding dirty pages? There's a filesystem mounted on that block device, though. I don't think the filesystem is going to notice the underlying block device capacity change and break out of any of these functions.