On Wed, Nov 25, 2015 at 02:09:34PM -0800, Dan Williams wrote: > On Wed, Nov 25, 2015 at 1:50 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Wed, Nov 25, 2015 at 10:37:24AM -0800, Dan Williams wrote: > >> Set SB_I_BDIDEAD when a block device is no longer available to service > >> requests. This will be used in several places where an fs should give > >> up early because the block device is gone. Letting the fs continue on > >> as if the block device is still present can lead to long latencies > >> waiting for an fs to detect the loss of its backing device, trigger > >> crashes, or generate misleasing warnings. > >> > >> Cc: Jan Kara <jack@xxxxxxxx> > >> Cc: Jens Axboe <axboe@xxxxxx> > >> Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > > > > This isn't what I suggested. :/ > > > > ..... > > > >> diff --git a/fs/block_dev.c b/fs/block_dev.c > >> index 1dd416bf72f7..d0233d643d33 100644 > >> --- a/fs/block_dev.c > >> +++ b/fs/block_dev.c > >> @@ -1795,6 +1795,23 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty) > >> } > >> EXPORT_SYMBOL(__invalidate_device); > >> > >> +void kill_bdev_super(struct gendisk *disk, int partno) > >> +{ > >> + struct block_device *bdev = bdget_disk(disk, partno); > >> + struct super_block *sb; > >> + > >> + if (!bdev) > >> + return; > >> + sb = get_super(bdev); > >> + if (!sb) > >> + goto out; > >> + > >> + sb->s_iflags |= SB_I_BDI_DEAD; > >> + drop_super(sb); > >> + out: > >> + bdput(bdev); > >> +} > > > > That's not a notification to the filesystem - that's a status flag > > the filesystem has to explicitly check for *on every operation*. We > > already have checks like these throughout filesystems, but they are > > filesystem specific and need to propagate into fs-specific > > subsystems that have knowledge of VFS level superblocks. > > > > To that end, what I actually suggested what a callback - something > > like a function in the super operations structure so that the > > filesystem can take *immediate action* when the block device is > > dying. i.e. > > > > void kill_bdev_super(struct gendisk *disk, int partno) > > { > > struct block_device *bdev = bdget_disk(disk, partno); > > struct super_block *sb; > > > > if (!bdev) > > return; > > sb = get_super(bdev); > > if (!sb) > > goto out; > > > > if (sb->s_ops->shutdown) > > sb->s_ops->shutdown(sb); > > > > drop_super(sb); > > out: > > bdput(bdev); > > } > > > > and then we implement ->shutdown somthing like this in XFS: > > > > xfs_fs_shutdown(struct superblock *sb) > > { > > xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ); > > } > > > > and so we immediately notify the entire filesystem that a shutdown > > state has been entered and the appropriate actions are immediately > > taken. > > > > That sounds good in theory. However, in the case of xfs it is already > calling xfs_force_shutdown(), Where? If XFS does not do any metadata IO, then it won't shut the filesystem down. We almost always allocate/map blocks without doing any IO, which means we cannot guarantee erroring out page faults during get_blocks until a shutdown has be triggered by other means.... > but that does not prevent it from > continuing to wait indefinitely at umount. Which is a bug we need to fix - I don't see how a shutdown implementation problem is at all relevant to triggering shutdowns in a prompt manner... > For the ext4 the > mark_inode_dirty() warning we're triggering the error in generic code. So? XFS doesn't use that generic code, but we have filesystem specific issues that we need to handle sanely. > None of this fixes the problem of dax mappings leaking past block > device remove. That can be done generically without needing per-fs > code. No, the shutdown is intended to close the race between the device being removed, the mappings being invalidated and the filesytem racing creating new mappings during page faults because it doesn't know the device has been unplugged until it does IO that gets some error in an unrecoverable path... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html