On Wed, Nov 25, 2015 at 1:50 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Wed, Nov 25, 2015 at 10:37:24AM -0800, Dan Williams wrote: >> Set SB_I_BDIDEAD when a block device is no longer available to service >> requests. This will be used in several places where an fs should give >> up early because the block device is gone. Letting the fs continue on >> as if the block device is still present can lead to long latencies >> waiting for an fs to detect the loss of its backing device, trigger >> crashes, or generate misleasing warnings. >> >> Cc: Jan Kara <jack@xxxxxxxx> >> Cc: Jens Axboe <axboe@xxxxxx> >> Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > > This isn't what I suggested. :/ > > ..... > >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 1dd416bf72f7..d0233d643d33 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -1795,6 +1795,23 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty) >> } >> EXPORT_SYMBOL(__invalidate_device); >> >> +void kill_bdev_super(struct gendisk *disk, int partno) >> +{ >> + struct block_device *bdev = bdget_disk(disk, partno); >> + struct super_block *sb; >> + >> + if (!bdev) >> + return; >> + sb = get_super(bdev); >> + if (!sb) >> + goto out; >> + >> + sb->s_iflags |= SB_I_BDI_DEAD; >> + drop_super(sb); >> + out: >> + bdput(bdev); >> +} > > That's not a notification to the filesystem - that's a status flag > the filesystem has to explicitly check for *on every operation*. We > already have checks like these throughout filesystems, but they are > filesystem specific and need to propagate into fs-specific > subsystems that have knowledge of VFS level superblocks. > > To that end, what I actually suggested what a callback - something > like a function in the super operations structure so that the > filesystem can take *immediate action* when the block device is > dying. i.e. > > void kill_bdev_super(struct gendisk *disk, int partno) > { > struct block_device *bdev = bdget_disk(disk, partno); > struct super_block *sb; > > if (!bdev) > return; > sb = get_super(bdev); > if (!sb) > goto out; > > if (sb->s_ops->shutdown) > sb->s_ops->shutdown(sb); > > drop_super(sb); > out: > bdput(bdev); > } > > and then we implement ->shutdown somthing like this in XFS: > > xfs_fs_shutdown(struct superblock *sb) > { > xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ); > } > > and so we immediately notify the entire filesystem that a shutdown > state has been entered and the appropriate actions are immediately > taken. > That sounds good in theory. However, in the case of xfs it is already calling xfs_force_shutdown(), but that does not prevent it from continuing to wait indefinitely at umount. For the ext4 the mark_inode_dirty() warning we're triggering the error in generic code. None of this fixes the problem of dax mappings leaking past block device remove. That can be done generically without needing per-fs code. Solving the "zombie filesystem after block device down" problem is incremental to this patch set. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html