On 2019/7/29 18:17, Guoqing Jiang wrote:
Now actually add list to CC ..., sorry about it.
On 7/29/19 12:15 PM, Guoqing Jiang wrote:
Forgot to cc list ...
Thanks a lot for the detailed explanation!
For the issue, maybe a better way is to call "p->rdev = NULL" after
synchronize_rcu().
Could you try the below change? Thanks.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 34e26834ad28..867808bed325 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1825,16 +1825,17 @@ static int raid1_remove_disk(struct mddev
*mddev, struct md_rdev *rdev)
err = -EBUSY;
goto abort;
}
- p->rdev = NULL;
if (!test_bit(RemoveSynchronized, &rdev->flags)) {
synchronize_rcu();
+ p->rdev = NULL;
if (atomic_read(&rdev->nr_pending)) {
/* lost the race, try later */
err = -EBUSY;
p->rdev = rdev;
goto abort;
}
- }
+ } else
+ p->rdev = NULL;
if (conf->mirrors[conf->raid_disks + number].rdev) {
/* We just removed a device that is being
replaced.
* Move down the replacement. We drain all
IO before
I don't think this can fix the race condition completely.
- p->rdev = NULL;
if (!test_bit(RemoveSynchronized, &rdev->flags)) {
synchronize_rcu();
+ p->rdev = NULL;
if (atomic_read(&rdev->nr_pending)) {
If we access conf->mirrors[i].rdev (e.g. raid1_write_request()) after
RCU grace period,
synchronize_rcu() will not wait the reader. Then, it also can cause NULL
pointer dereference.
That is the reason why we add the new flag 'WantRemove'. It can prevent
the reader to access
the 'rdev' after RCU grace period.
And I think the change should apply to raid10 as well, but let's see
the above works or not.
Seems raid10_error doesn't have the code about recovery_disabled,
then it is not necessary for raid10.
Thanks,
Guoqing
.
To be honest, I am not sure if RAID10/RAID5 also has this problem. More
time needed to debug.
Thanks a lot.
Yufen