On 2/18/19 4:08 PM, jianchao.wang wrote: > Hi Bob > > On 2/13/19 5:50 PM, Bob Liu wrote: >> Motivation: >> When fs data/metadata checksum mismatch, lower block devices may have other >> correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but >> decides that the metadata is garbage, today it will shut down the entire >> filesystem without trying any of the other mirrors. This is a severe >> loss of service, and we propose these patches to have XFS try harder to >> avoid failure. >> >> This patch prototype this mirror retry idea by: >> * Adding @nr_mirrors to struct request_queue which is similar as >> blk_queue_nonrot(), filesystem can grab device request queue and check max >> mirrors this block device has. >> Helper functions were also added to get/set the nr_mirrors. >> >> * Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap >> in order to support stacked layer case. > > Why does we need a bitmap to know which underlying device has been tried ? > For example, the following scenario, > > md8 > / | \ > sda sdb sdc > > If the the raid read the data from sda and fs check and find the data is corrupted. > Then we may just need to let raid1 know that the data is from sda. Then based on this > hint, raid1 could handle it with handle_read_error to try other replica and fix the > error. This doesn't work. The md raid1 can only see IO success or failure, so fix_read_error won't fix this. Sorry for the noise. Thanks Jianchao > > If this is feasible, we just need to modify the bio as following and needn't add any > bytes in it. > > struct bio { > ... > union { > unsigned short bi_write_hint; > unsigned short bi_read_hint; > } > ... > } > > Thanks > Jianchao >> >> * Modify md/raid1 to support this retry feature. >> >> * Adapter xfs to use this feature. >> If the read verify fails, we loop over the available mirrors and retry the read. >> >> * Rewrite retried read >> When the read verification fails, but the retry succeedes >> write the buffer back to correct the bad mirror >> >> * Add tracepoints and logging to alternate device retry. >> This patch adds new log entries and trace points to the alternate device retry >> error path. >> >> Changes v2: >> - No more reuse bi_write_hint >> - Stacked layer support(see patch 4/9) >> - Other feedback fix >> >> Allison Henderson (5): >> Add b_alt_retry to xfs_buf >> xfs: Add b_rd_hint to xfs_buf >> xfs: Add device retry >> xfs: Rewrite retried read >> xfs: Add tracepoints and logging to alternate device retry >> >> Bob Liu (4): >> block: add nr_mirrors to request_queue >> block: add rd_hint to bio and request >> md:raid1: set mirrors correctly >> md:raid1: rd_hint support and consider stacked layer case >> >> Documentation/block/biodoc.txt | 3 + >> block/bio.c | 1 + >> block/blk-core.c | 4 ++ >> block/blk-merge.c | 6 ++ >> block/blk-settings.c | 24 +++++++ >> block/bounce.c | 1 + >> drivers/md/raid1.c | 123 ++++++++++++++++++++++++++++++++- >> fs/xfs/xfs_buf.c | 58 +++++++++++++++- >> fs/xfs/xfs_buf.h | 14 ++++ >> fs/xfs/xfs_trace.h | 6 +- >> include/linux/blk_types.h | 1 + >> include/linux/blkdev.h | 4 ++ >> include/linux/types.h | 3 + >> 13 files changed, 244 insertions(+), 4 deletions(-) >> >