Re: [PATCH v2 5/7] fs: notify superblocks of backing-device death

Dan Williams <dan.j.williams@xxxxxxxxx> · Wed, 25 Nov 2015 14:09:34 -0800

On Wed, Nov 25, 2015 at 1:50 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Nov 25, 2015 at 10:37:24AM -0800, Dan Williams wrote:
>> Set SB_I_BDIDEAD when a block device is no longer available to service
>> requests.  This will be used in several places where an fs should give
>> up early because the block device is gone.  Letting the fs continue on
>> as if the block device is still present can lead to long latencies
>> waiting for an fs to detect the loss of its backing device, trigger
>> crashes, or generate misleasing warnings.
>>
>> Cc: Jan Kara <jack@xxxxxxxx>
>> Cc: Jens Axboe <axboe@xxxxxx>
>> Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
>
> This isn't what I suggested. :/
>
> .....
>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 1dd416bf72f7..d0233d643d33 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1795,6 +1795,23 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
>>  }
>>  EXPORT_SYMBOL(__invalidate_device);
>>
>> +void kill_bdev_super(struct gendisk *disk, int partno)
>> +{
>> +     struct block_device *bdev = bdget_disk(disk, partno);
>> +     struct super_block *sb;
>> +
>> +     if (!bdev)
>> +             return;
>> +     sb = get_super(bdev);
>> +     if (!sb)
>> +             goto out;
>> +
>> +     sb->s_iflags |= SB_I_BDI_DEAD;
>> +     drop_super(sb);
>> + out:
>> +     bdput(bdev);
>> +}
>
> That's not a notification to the filesystem - that's a status flag
> the filesystem has to explicitly check for *on every operation*. We
> already have checks like these throughout filesystems, but they are
> filesystem specific and need to propagate into fs-specific
> subsystems that have knowledge of VFS level superblocks.
>
> To that end, what I actually suggested what a callback - something
> like a function in the super operations structure so that the
> filesystem can take *immediate action* when the block device is
> dying. i.e.
>
> void kill_bdev_super(struct gendisk *disk, int partno)
> {
>         struct block_device *bdev = bdget_disk(disk, partno);
>         struct super_block *sb;
>
>         if (!bdev)
>                 return;
>         sb = get_super(bdev);
>         if (!sb)
>                 goto out;
>
>         if (sb->s_ops->shutdown)
>                 sb->s_ops->shutdown(sb);
>
>         drop_super(sb);
>  out:
>         bdput(bdev);
> }
>
> and then we implement ->shutdown somthing like this in XFS:
>
> xfs_fs_shutdown(struct superblock *sb)
> {
>         xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REQ);
> }
>
> and so we immediately notify the entire filesystem that a shutdown
> state has been entered and the appropriate actions are immediately
> taken.
>

That sounds good in theory.  However, in the case of xfs it is already
calling xfs_force_shutdown(), but that does not prevent it from
continuing to wait indefinitely at umount.  For the ext4 the
mark_inode_dirty() warning we're triggering the error in generic code.

None of this fixes the problem of dax mappings leaking past block
device remove.  That can be done generically without needing per-fs
code.

Solving the "zombie filesystem after block device down" problem is
incremental to this patch set.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html