On Mon, 15 Jul 2013 10:35:07 -0500 Brassow Jonathan <jbrassow@xxxxxxxxxx> wrote: > > On Jun 25, 2013, at 1:32 AM, NeilBrown wrote: > > > On Tue, 25 Jun 2013 01:19:20 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx> > > wrote: > > > >> Neil, > >> > >> I've noticed that the "check" operation no longer works for RAID10. It > >> works just fine for the other RAIDs. The ("data-check") sync_thread > >> kicks off just fine, sync_request_write() is called, but it never gets > >> past: > >> if (i == conf->copies) > >> goto done; > >> The test I am performing creates a RAID array, waits for it to sync, > >> shuts it down, writes random data to one of the devices, assembles the > >> array, and then runs a "check" - there should be descrepancies. The > >> descrepancies are found and recorded in resync_mismatches for all RAIDs > >> <= 3.9 and only for non-RAID10 3.10-rc1+. > > > > I just tried on 3.10-rc5+ and it works as expected. > > If you can provide a test script that fails, I'll look into it. > > Just tried 3.10 - it fails for me there too. I'll send you the script I use shortly. > > thanks, > brassow > > (vacation ends soon.) :-) Thanks. This patch seems to fix it. NeilBrown From b0b0ac3ecf1e54dd6a429294082c47f1e52db41d Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Tue, 16 Jul 2013 16:50:47 +1000 Subject: [PATCH] md/raid10: fix two problems with RAID10 resync. 1/ When an different between blocks is found, data is copied from one bio to the other. However bv_len is used as the length to copy and this could be zero. So use r10_bio->sectors to calculate length instead. Using bv_len was probably always a bit dubious, but the introduction of bio_advance made it much more likely to be a problem. 2/ When preparing some blocks for sync, we don't set BIO_UPTODATE except on bios that we schedule for a read. This ensures that missing/failed devices don't confuse the loop at the top of sync_request write. Commit 8be185f2c9d54d6 "raid10: Use bio_reset()" removed a loop which set BIO_UPTDATE on all appropriate bios. So we need to re-add that flag. Reported-by: Brassow Jonathan <jbrassow@xxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index cd066b6..957a719 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2097,11 +2097,17 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio) * both 'first' and 'i', so we just compare them. * All vec entries are PAGE_SIZE; */ - for (j = 0; j < vcnt; j++) + int sectors = r10_bio->sectors; + for (j = 0; j < vcnt; j++) { + int len = PAGE_SIZE; + if (sectors < (len / 512)) + len = sectors * 512; if (memcmp(page_address(fbio->bi_io_vec[j].bv_page), page_address(tbio->bi_io_vec[j].bv_page), - fbio->bi_io_vec[j].bv_len)) + len)) break; + sectors -= len/512; + } if (j == vcnt) continue; atomic64_add(r10_bio->sectors, &mddev->resync_mismatches); @@ -3407,6 +3413,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, if (bio->bi_end_io == end_sync_read) { md_sync_acct(bio->bi_bdev, nr_sectors); + set_bit(BIO_UPTODATE, &bio->bi_flags); generic_make_request(bio); } }
Attachment:
signature.asc
Description: PGP signature