Re: Scrubbing "check" not working for RAID10 in 3.10-rc1+

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 15 Jul 2013 10:35:07 -0500 Brassow Jonathan <jbrassow@xxxxxxxxxx>
wrote:

> 
> On Jun 25, 2013, at 1:32 AM, NeilBrown wrote:
> 
> > On Tue, 25 Jun 2013 01:19:20 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx>
> > wrote:
> > 
> >> Neil,
> >> 
> >> I've noticed that the "check" operation no longer works for RAID10.  It
> >> works just fine for the other RAIDs.  The ("data-check") sync_thread
> >> kicks off just fine, sync_request_write() is called, but it never gets
> >> past:
> >>        if (i == conf->copies)
> >>                goto done;
> >> The test I am performing creates a RAID array, waits for it to sync,
> >> shuts it down, writes random data to one of the devices, assembles the
> >> array, and then runs a "check" - there should be descrepancies.  The
> >> descrepancies are found and recorded in resync_mismatches for all RAIDs
> >> <= 3.9 and only for non-RAID10 3.10-rc1+.
> > 
> > I just tried on 3.10-rc5+ and it works as expected.
> > If you can provide a test script that fails, I'll look into it.
> 
> Just tried 3.10 - it fails for me there too.  I'll send you the script I use shortly.
> 
> thanks,
>  brassow
> 
> (vacation ends soon.)
:-)

Thanks.  This patch seems to fix it.

NeilBrown

From b0b0ac3ecf1e54dd6a429294082c47f1e52db41d Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@xxxxxxx>
Date: Tue, 16 Jul 2013 16:50:47 +1000
Subject: [PATCH] md/raid10: fix two problems with RAID10 resync.

1/ When an different between blocks is found, data is copied from
   one bio to the other.  However bv_len is used as the length to
   copy and this could be zero.  So use r10_bio->sectors to calculate
   length instead.
   Using bv_len was probably always a bit dubious, but the introduction
   of bio_advance made it much more likely to be a problem.

2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
   except on bios that we schedule for a read.  This ensures that
   missing/failed devices don't confuse the loop at the top of
   sync_request write.
   Commit 8be185f2c9d54d6 "raid10: Use bio_reset()"
   removed a loop which set BIO_UPTDATE on all appropriate bios.
   So we need to re-add that flag.

Reported-by: Brassow Jonathan <jbrassow@xxxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index cd066b6..957a719 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2097,11 +2097,17 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 			 * both 'first' and 'i', so we just compare them.
 			 * All vec entries are PAGE_SIZE;
 			 */
-			for (j = 0; j < vcnt; j++)
+			int sectors = r10_bio->sectors;
+			for (j = 0; j < vcnt; j++) {
+				int len = PAGE_SIZE;
+				if (sectors < (len / 512))
+					len = sectors * 512;
 				if (memcmp(page_address(fbio->bi_io_vec[j].bv_page),
 					   page_address(tbio->bi_io_vec[j].bv_page),
-					   fbio->bi_io_vec[j].bv_len))
+					   len))
 					break;
+				sectors -= len/512;
+			}
 			if (j == vcnt)
 				continue;
 			atomic64_add(r10_bio->sectors, &mddev->resync_mismatches);
@@ -3407,6 +3413,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 
 		if (bio->bi_end_io == end_sync_read) {
 			md_sync_acct(bio->bi_bdev, nr_sectors);
+			set_bit(BIO_UPTODATE, &bio->bi_flags);
 			generic_make_request(bio);
 		}
 	}

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux