On Thu, Sep 24, 2015 at 02:03:53PM +1000, Neil Brown wrote: > Shaohua Li <shli@xxxxxx> writes: > > > On Wed, Sep 23, 2015 at 04:05:33PM +1000, Neil Brown wrote: > >> Shaohua Li <shli@xxxxxx> writes: > >> > >> > If faulty disks of an array are more than allowed degraded number, the > >> > array enters error handling. It will be marked as read-only with > >> > MD_CHANGE_PENDING/RECOVERY_NEEDED set. But currently recovery doesn't > >> > clear CHANGE_PENDING bit for read-only array. If MD_CHANGE_PENDING is > >> > set for a raid5 array, all returned IO will be hold on a list till the > >> > bit is clear. But recovery nevery clears this bit, the IO is always in > >> > pending state and nevery finish. This has bad effects like upper layer > >> > can't get an IO error and the array can't be stopped. > >> > > >> > Signed-off-by: Shaohua Li <shli@xxxxxx> > >> > --- > >> > drivers/md/md.c | 1 + > >> > 1 file changed, 1 insertion(+) > >> > > >> > diff --git a/drivers/md/md.c b/drivers/md/md.c > >> > index 95824fb..c596b73 100644 > >> > --- a/drivers/md/md.c > >> > +++ b/drivers/md/md.c > >> > @@ -8209,6 +8209,7 @@ void md_check_recovery(struct mddev *mddev) > >> > md_reap_sync_thread(mddev); > >> > clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > >> > clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > >> > + clear_bit(MD_CHANGE_PENDING, &mddev->flags); > >> > goto unlock; > >> > } > >> > > >> > -- > >> > 1.8.1 > >> > >> Hi, > >> I can see that clearing MD_CHANGE_PENDING there is probably correct - > >> bug introduced by > >> Commit: c3cce6cda162 ("md/raid5: ensure device failure recorded before write request returns.") > >> > >> However I don't understand your reasoning. You say that the array is > >> marked as read-only, but I don't see how that would happen. What > >> causes the array to be marked "read-only"? > > > > It's set read-only by mdadm. I didn't look carefully, but looks there is > > disk failure event, mdadm is invoked automatically by some background > > daemon. It's a ubuntu distribution. > > Thanks. > This raises a couple of questions. > > 1/ What should md_set_readonly do if it finds that MD_CHANGE_PENDING is > set? > Maybe it should wait for md_check_recovery to get run which should > clear the bit, after probably writing out the metadata. > > 2/ Why didn't md_check_recovery already do that before mdadm had a > chance to set the array read-only? > I guess that is just a timing thing. md_check_recovery could be > delayed, and mdadm could get called by udev rather quickly. > > I think I'll get md_set_readonly to > wait_event(mddev->sb_wait, > !test_bit(MD_CHANGE_PENDING, &mddev->flags)); > > because I think that is the right thing to do. But if the array is > already read-only that won't help, so I'll still need you patch. > > Would you be able to test that the following patch (without your patch) > also fixes the symptom? Yes, the wait_event patch fixes the issue (without my patch). Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html