On 2/19/19 5:31 AM, Dave Chinner wrote: > On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote: >> Motivation: >> When fs data/metadata checksum mismatch, lower block devices may have other >> correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but >> decides that the metadata is garbage, today it will shut down the entire >> filesystem without trying any of the other mirrors. This is a severe >> loss of service, and we propose these patches to have XFS try harder to >> avoid failure. >> >> This patch prototype this mirror retry idea by: >> * Adding @nr_mirrors to struct request_queue which is similar as >> blk_queue_nonrot(), filesystem can grab device request queue and check max >> mirrors this block device has. >> Helper functions were also added to get/set the nr_mirrors. >> >> * Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap >> in order to support stacked layer case. >> >> * Modify md/raid1 to support this retry feature. >> >> * Adapter xfs to use this feature. >> If the read verify fails, we loop over the available mirrors and retry the read. > > Why does the filesystem have to iterate every single posible > combination of devices that are underneath it? > > Wouldn't it be much simpler to be able to attach a verifier > function to the bio, and have each layer that gets called iterate > over all it's copies internally until the verfier function passes > or all copies are exhausted? > > This works for stacked mirrors - it can pass the higher layer > verifier down as far as necessary. It can work for RAID5/6, too, by > having that layer supply it's own verifier for reads that verifies > parity and can reconstruct of failure, then when it's reconstructed > a valid stripe it can run the verifier that was supplied to it from > above, etc. > > i.e. I dont see why only filesystems should drive retries or have to > be aware of the underlying storage stacking. ISTM that each > layer of the storage stack should be able to verify what has been > returned to it is valid independently of the higher layer > requirements. The only difference from a caller point of view should > be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func); > We already have bio->bi_end_io(), how about do the verification inside bi_end_io()? Then the whole sequence would like: bio_endio() > 1.bio->bi_end_io() > xfs_buf_bio_end_io() > verify, set bio->bi_status = "please retry" if verify fail > 2.if found bio->bi_status = retry > 3.resubmit bio Is it fine to resubmit a bio inside bio_endio()? - Thanks, Bob.