Re: [PATCH 0/2] raid1/10: Handle write errors correctly in narrow_write_error()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you!

I confirmed that this patch prevents the bug.

Nate



On 10/22/2015 08:09 PM, Neil Brown wrote:
Nate Dailey <nate.dailey@xxxxxxxxxxx> writes:

The problem is that we aren't getting true write (medium) errors.

In this case we're testing device removals. The write errors happen because the
disk goes away. Narrow_write_error returns 1, the bitmap bit is cleared, and
then when the device is re-added the resync might not include the sectors in
that chunk (there's some luck involved; if other writes to that chunk happen
while the disk is removed, we're okay--bug is easier to hit with smaller bitmap
chunks because of this).


OK, that makes sense.

The device removal will be noticed when the bad block log is written
out.
When a bad-block is recorded we make sure to write that out promptly
before bio_endio() gets called.  But not before close_write() has called
bitmap_end_write().

So I guess we need to delay the close_write() call until the
bad-block-log has been written.

I think this patch should do it.  Can you test?

Thanks,
NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c1ad0b075807..1a1c5160c930 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2269,8 +2269,6 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio)
  			rdev_dec_pending(conf->mirrors[m].rdev,
  					 conf->mddev);
  		}
-	if (test_bit(R1BIO_WriteError, &r1_bio->state))
-		close_write(r1_bio);
  	if (fail) {
  		spin_lock_irq(&conf->device_lock);
  		list_add(&r1_bio->retry_list, &conf->bio_end_io_list);
@@ -2396,6 +2394,9 @@ static void raid1d(struct md_thread *thread)
  			r1_bio = list_first_entry(&tmp, struct r1bio,
  						  retry_list);
  			list_del(&r1_bio->retry_list);
+			if (mddev->degraded)
+				set_bit(R1BIO_Degraded, &r1_bio->state);
+			close_write(r1_bio);
  			raid_end_bio_io(r1_bio);
  		}
  	}

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux