Shaohua Li <shli@xxxxxx> writes: > On Thu, Sep 24, 2015 at 02:03:53PM +1000, Neil Brown wrote: >> Shaohua Li <shli@xxxxxx> writes: >> >> > On Wed, Sep 23, 2015 at 04:05:33PM +1000, Neil Brown wrote: >> >> Shaohua Li <shli@xxxxxx> writes: >> >> >> >> > If faulty disks of an array are more than allowed degraded number, the >> >> > array enters error handling. It will be marked as read-only with >> >> > MD_CHANGE_PENDING/RECOVERY_NEEDED set. But currently recovery doesn't >> >> > clear CHANGE_PENDING bit for read-only array. If MD_CHANGE_PENDING is >> >> > set for a raid5 array, all returned IO will be hold on a list till the >> >> > bit is clear. But recovery nevery clears this bit, the IO is always in >> >> > pending state and nevery finish. This has bad effects like upper layer >> >> > can't get an IO error and the array can't be stopped. >> >> > >> >> > Signed-off-by: Shaohua Li <shli@xxxxxx> >> >> > --- >> >> > drivers/md/md.c | 1 + >> >> > 1 file changed, 1 insertion(+) >> >> > >> >> > diff --git a/drivers/md/md.c b/drivers/md/md.c >> >> > index 95824fb..c596b73 100644 >> >> > --- a/drivers/md/md.c >> >> > +++ b/drivers/md/md.c >> >> > @@ -8209,6 +8209,7 @@ void md_check_recovery(struct mddev *mddev) >> >> > md_reap_sync_thread(mddev); >> >> > clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >> >> > clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >> >> > + clear_bit(MD_CHANGE_PENDING, &mddev->flags); >> >> > goto unlock; >> >> > } >> >> > >> >> > -- >> >> > 1.8.1 >> >> >> >> Hi, >> >> I can see that clearing MD_CHANGE_PENDING there is probably correct - >> >> bug introduced by >> >> Commit: c3cce6cda162 ("md/raid5: ensure device failure recorded before write request returns.") >> >> >> >> However I don't understand your reasoning. You say that the array is >> >> marked as read-only, but I don't see how that would happen. What >> >> causes the array to be marked "read-only"? >> > >> > It's set read-only by mdadm. I didn't look carefully, but looks there is >> > disk failure event, mdadm is invoked automatically by some background >> > daemon. It's a ubuntu distribution. >> >> Thanks. >> This raises a couple of questions. >> >> 1/ What should md_set_readonly do if it finds that MD_CHANGE_PENDING is >> set? >> Maybe it should wait for md_check_recovery to get run which should >> clear the bit, after probably writing out the metadata. >> >> 2/ Why didn't md_check_recovery already do that before mdadm had a >> chance to set the array read-only? >> I guess that is just a timing thing. md_check_recovery could be >> delayed, and mdadm could get called by udev rather quickly. >> >> I think I'll get md_set_readonly to >> wait_event(mddev->sb_wait, >> !test_bit(MD_CHANGE_PENDING, &mddev->flags)); >> >> because I think that is the right thing to do. But if the array is >> already read-only that won't help, so I'll still need you patch. >> >> Would you be able to test that the following patch (without your patch) >> also fixes the symptom? > > Yes, the wait_event patch fixes the issue (without my patch). > > Thanks, > Shaohua Thanks for testing. I change "Reported-by:" to "Reported-and-tested-by:" :-) NeilBrown
Attachment:
signature.asc
Description: PGP signature