Shaohua Li <shli@xxxxxx> writes: > On Wed, Sep 23, 2015 at 04:05:33PM +1000, Neil Brown wrote: >> Shaohua Li <shli@xxxxxx> writes: >> >> > If faulty disks of an array are more than allowed degraded number, the >> > array enters error handling. It will be marked as read-only with >> > MD_CHANGE_PENDING/RECOVERY_NEEDED set. But currently recovery doesn't >> > clear CHANGE_PENDING bit for read-only array. If MD_CHANGE_PENDING is >> > set for a raid5 array, all returned IO will be hold on a list till the >> > bit is clear. But recovery nevery clears this bit, the IO is always in >> > pending state and nevery finish. This has bad effects like upper layer >> > can't get an IO error and the array can't be stopped. >> > >> > Signed-off-by: Shaohua Li <shli@xxxxxx> >> > --- >> > drivers/md/md.c | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/drivers/md/md.c b/drivers/md/md.c >> > index 95824fb..c596b73 100644 >> > --- a/drivers/md/md.c >> > +++ b/drivers/md/md.c >> > @@ -8209,6 +8209,7 @@ void md_check_recovery(struct mddev *mddev) >> > md_reap_sync_thread(mddev); >> > clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >> > clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >> > + clear_bit(MD_CHANGE_PENDING, &mddev->flags); >> > goto unlock; >> > } >> > >> > -- >> > 1.8.1 >> >> Hi, >> I can see that clearing MD_CHANGE_PENDING there is probably correct - >> bug introduced by >> Commit: c3cce6cda162 ("md/raid5: ensure device failure recorded before write request returns.") >> >> However I don't understand your reasoning. You say that the array is >> marked as read-only, but I don't see how that would happen. What >> causes the array to be marked "read-only"? > > It's set read-only by mdadm. I didn't look carefully, but looks there is > disk failure event, mdadm is invoked automatically by some background > daemon. It's a ubuntu distribution. Thanks. This raises a couple of questions. 1/ What should md_set_readonly do if it finds that MD_CHANGE_PENDING is set? Maybe it should wait for md_check_recovery to get run which should clear the bit, after probably writing out the metadata. 2/ Why didn't md_check_recovery already do that before mdadm had a chance to set the array read-only? I guess that is just a timing thing. md_check_recovery could be delayed, and mdadm could get called by udev rather quickly. I think I'll get md_set_readonly to wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING, &mddev->flags)); because I think that is the right thing to do. But if the array is already read-only that won't help, so I'll still need you patch. Would you be able to test that the following patch (without your patch) also fixes the symptom? Thanks. NeilBrown From a0a21b57523c41e87e0a50333b15f6c7ca02d714 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxxx> Date: Thu, 24 Sep 2015 14:00:51 +1000 Subject: [PATCH] md: wait for pending superblock updates before switching to read-only If a superblock update is pending, wait for it to complete before letting md_set_readonly() switch to readonly. Otherwise we might lose important information about a device having failed. Reported-by: Shaohua Li <shli@xxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxxx> diff --git a/drivers/md/md.c b/drivers/md/md.c index 23651bf8dd80..9882f70f45e1 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5432,6 +5432,8 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev) mddev_unlock(mddev); wait_event(resync_wait, !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)); + wait_event(mddev->sb_wait, + !test_bit(MD_CHANGE_PENDING, &mddev->flags)); mddev_lock_nointr(mddev); mutex_lock(&mddev->open_mutex);
Attachment:
signature.asc
Description: PGP signature