Re: [ext4 io hang] buffered write io hang in balance_dirty_pages

Dave Chinner <dchinner@xxxxxxxxxx> · Fri, 28 Apr 2023 09:33:20 +1000

On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote:
> Hello Guys,
> 
> I got one report in which buffered write IO hangs in balance_dirty_pages,
> after one nvme block device is unplugged physically, then umount can't
> succeed.

The bug here is that the device unplug code has not told the
filesystem that it's gone away permanently.

This is the same problem we've been having for the past 15 years -
when block device goes away permanently it leaves the filesystem and
everything else dependent on the block device completely unaware
that they are unable to function anymore. IOWs, the block
device remove path is relying on -unreliable side effects- of
filesystem IO error handling to produce what we'd call "correct
behaviour".

The block device needs to be shutting down the filesystem when it
has some sort of fatal, unrecoverable error like this (e.g. hot
unplug). We have the XFS_IOC_GOINGDOWN ioctl for telling the
filesystem it can't function anymore. This ioctl
(_IOR('X',125,__u32)) has also been replicated into ext4, f2fs and
CIFS and it gets exercised heavily by fstests. Hence this isn't XFS
specific functionality, nor is it untested functionality.

The ioctl should be lifted to the VFS as FS_IOC_SHUTDOWN and a
super_operations method added to trigger a filesystem shutdown.
That way the block device removal code could simply call
sb->s_ops->shutdown(sb, REASON) if it exists rather than
sync_filesystem(sb) if there's a superblock associated with the
block device. Then all these 

This way we won't have to spend another two decades of people
complaining about how applications and filesystems hang when they
pull the storage device out from under them and the filesystem
didn't do something that made it notice before the system hung....

> So far only observed on ext4 FS, not see it on XFS.

Pure dumb luck - a journal IO failed on XFS (probably during the
sync_filesystem() call) and that shut the filesystem down.

> I guess it isn't
> related with disk type, and not tried such test on other type of disks yet,
> but will do.

It can happen on any block device based storage that gets pulled
from under any filesystem without warning.

> Seems like dirty pages aren't cleaned after ext4 bio is failed in this
> situation?

Yes, because the filesystem wasn't shut down on device removal to
tell it that it's allowed to toss away dirty pages as they cannot be
cleaned via the IO path....

-Dave.
-- 
Dave Chinner
dchinner@xxxxxxxxxx