Nate Dailey <nate.dailey@xxxxxxxxxxx> writes: > Thank you! > > I confirmed that this patch prevents the bug. > > Nate Awesome, thanks Nate! Neil once you commit the final version of this patch, please let me know. Cheers, Jes > > > > On 10/22/2015 08:09 PM, Neil Brown wrote: >> Nate Dailey <nate.dailey@xxxxxxxxxxx> writes: >> >>> The problem is that we aren't getting true write (medium) errors. >>> >>> In this case we're testing device removals. The write errors happen >>> because the >>> disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and >>> then when the device is re-added the resync might not include the sectors in >>> that chunk (there's some luck involved; if other writes to that chunk happen >>> while the disk is removed, we're okay--bug is easier to hit with >>> smaller bitmap >>> chunks because of this). >>> >>> >> OK, that makes sense. >> >> The device removal will be noticed when the bad block log is written >> out. >> When a bad-block is recorded we make sure to write that out promptly >> before bio_endio() gets called. But not before close_write() has called >> bitmap_end_write(). >> >> So I guess we need to delay the close_write() call until the >> bad-block-log has been written. >> >> I think this patch should do it. Can you test? >> >> Thanks, >> NeilBrown >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index c1ad0b075807..1a1c5160c930 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) >> rdev_dec_pending(conf->mirrors[m].rdev, >> conf->mddev); >> } >> - if (test_bit(R1BIO_WriteError, &r1_bio->state)) >> - close_write(r1_bio); >> if (fail) { >> spin_lock_irq(&conf->device_lock); >> list_add(&r1_bio->retry_list, &conf->bio_end_io_list); >> @@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread) >> r1_bio = list_first_entry(&tmp, struct r1bio, >> retry_list); >> list_del(&r1_bio->retry_list); >> + if (mddev->degraded) >> + set_bit(R1BIO_Degraded, &r1_bio->state); >> + close_write(r1_bio); >> raid_end_bio_io(r1_bio); >> } >> } -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html