Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

Bob Liu <bob.liu@xxxxxxxxxx> · Thu, 28 Feb 2019 22:22:02 +0800

On 2/19/19 5:31 AM, Dave Chinner wrote:
> On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:
>> Motivation:
>> When fs data/metadata checksum mismatch, lower block devices may have other
>> correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but
>> decides that the metadata is garbage, today it will shut down the entire
>> filesystem without trying any of the other mirrors.  This is a severe
>> loss of service, and we propose these patches to have XFS try harder to
>> avoid failure.
>>
>> This patch prototype this mirror retry idea by:
>> * Adding @nr_mirrors to struct request_queue which is similar as
>>   blk_queue_nonrot(), filesystem can grab device request queue and check max
>>   mirrors this block device has.
>>   Helper functions were also added to get/set the nr_mirrors.
>>
>> * Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap
>> in order to support stacked layer case.
>>
>> * Modify md/raid1 to support this retry feature.
>>
>> * Adapter xfs to use this feature.
>>   If the read verify fails, we loop over the available mirrors and retry the read.
> 
> Why does the filesystem have to iterate every single posible
> combination of devices that are underneath it?
> 
> Wouldn't it be much simpler to be able to attach a verifier
> function to the bio, and have each layer that gets called iterate
> over all it's copies internally until the verfier function passes
> or all copies are exhausted?
> 
> This works for stacked mirrors - it can pass the higher layer
> verifier down as far as necessary. It can work for RAID5/6, too, by
> having that layer supply it's own verifier for reads that verifies
> parity and can reconstruct of failure, then when it's reconstructed
> a valid stripe it can run the verifier that was supplied to it from
> above, etc.
> 
> i.e. I dont see why only filesystems should drive retries or have to
> be aware of the underlying storage stacking. ISTM that each
> layer of the storage stack should be able to verify what has been
> returned to it is valid independently of the higher layer
> requirements. The only difference from a caller point of view should
> be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func);
> 

We already have bio->bi_end_io(), how about do the verification inside bi_end_io()?

Then the whole sequence would like:
bio_endio()
    > 1.bio->bi_end_io()
        > xfs_buf_bio_end_io()
            > verify, set bio->bi_status = "please retry" if verify fail

    > 2.if found bio->bi_status = retry
    > 3.resubmit bio

Is it fine to resubmit a bio inside bio_endio()?

- Thanks, Bob.