Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> writes: > Neil Brown <neilb@xxxxxxx> writes: >> Jes.Sorensen@xxxxxxxxxx writes: >> >>> From: Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> >>> >>> Hi, >>> >>> Bill Kuzeja reported a problem to me about data corruption when >>> repeatedly removing and re-adding devices in raid1 arrays. It showed >>> up to be caused by the return value of submit_bio_wait() being handled >>> incorrectly. Tracking this down is credit of Bill! >>> >>> Looks like commit 9e882242c6193ae6f416f2d8d8db0d9126bd996b changed the >>> return of submit_bio_wait() to return != 0 on error, whereas before it >>> returned 0 on error. >>> >>> This fix should be suitable for -stable as far back as 3.9 >> >> 3.10? >> >> Thanks to both of you! >> >> I took the liberty of changing the patches a little so they are now: >> >> - if (submit_bio_wait(WRITE, wbio) == 0) >> + if (submit_bio_wait(WRITE, wbio) < 0) >> >> because when there is no explicit test I tend to expect a Bool but these >> values are not Bool. >> >> Patches are in my for-linus branch and will be forwarded sometime this >> week. >> >> This bug only causes a problem when bad-block logs are active, so >> hopefully it won't have caused too much corruption yet -- you would need >> to be using a newish mdadm. > > Neil, > > An additional twist on this one - Nate ran more tests on this, but was > still able to hit data corruption. He suggests the it is a mistake to > set 'ok = rdev_set_badblocks()' and it should instead be set to 0 if > submit_bio_wait() fails. Like this: > > --- raid1.c > +++ raid1.c > @@ -2234,11 +2234,12 @@ > bio_trim(wbio, sector - r1_bio->sector, sectors); > wbio->bi_sector += rdev->data_offset; > wbio->bi_bdev = rdev->bdev; > if (submit_bio_wait(WRITE, wbio) < 0) { > /* failure! */ > - ok = rdev_set_badblocks(rdev, sector, > - sectors, 0) > - && ok; > + ok = 0; > + rdev_set_badblocks(rdev, sector, > + sectors, 0); > + } > > Question is whether this change has any negative impact in case of a > real write failure? > > I have actual patches, I'll send as a reply to this one. > If we unconditionally set ok to 0 on a write error, then narrow_write_error() will return 0 and handle_write finished() will call md_error() to kick the device out of the array. And given that we only call narrow_write_error() when we got a write error, we strongly expect at least one sector to give an error. So it seems to me that the net result of this patch is to make bad-block-lists completely ineffective. What sort of tests are you running, and what sort of corruption do you see? NeilBrown
Attachment:
signature.asc
Description: PGP signature