Re: [PATCH] md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev

Yufen Yu <yuyufen@xxxxxxxxxx> · Mon, 29 Jul 2019 19:36:54 +0800

On 2019/7/29 18:17, Guoqing Jiang wrote:
Now actually add list to CC ..., sorry about it.

On 7/29/19 12:15 PM, Guoqing Jiang wrote:
Forgot to cc list ...

Thanks a lot for the detailed explanation!

For the issue, maybe a better way is to call "p->rdev = NULL" after 
synchronize_rcu().
Could you try the below change? Thanks.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 34e26834ad28..867808bed325 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1825,16 +1825,17 @@ static int raid1_remove_disk(struct mddev 
*mddev, struct md_rdev *rdev)
                         err = -EBUSY;
                         goto abort;
                 }
-               p->rdev = NULL;
                 if (!test_bit(RemoveSynchronized, &rdev->flags)) {
                         synchronize_rcu();
+                       p->rdev = NULL;
                         if (atomic_read(&rdev->nr_pending)) {
                                 /* lost the race, try later */
                                 err = -EBUSY;
                                 p->rdev = rdev;
                                 goto abort;
                         }
-               }
+               } else
+                       p->rdev = NULL;
                 if (conf->mirrors[conf->raid_disks + number].rdev) {
                         /* We just removed a device that is being 
replaced.
                          * Move down the replacement.  We drain all 
IO before


I don't think this can fix the race condition completely.

-               p->rdev = NULL;
                 if (!test_bit(RemoveSynchronized, &rdev->flags)) {
                         synchronize_rcu();
+                       p->rdev = NULL;
                         if (atomic_read(&rdev->nr_pending)) {

If we access conf->mirrors[i].rdev (e.g. raid1_write_request()) after 
RCU grace period,
synchronize_rcu() will not wait the reader. Then, it also can cause NULL 
pointer dereference.

That is the reason why we add the new flag 'WantRemove'. It can prevent 
the reader to access
the 'rdev' after RCU grace period.



And I think the change should apply to raid10 as well, but let's see 
the above works or not.


Seems raid10_error doesn't have the code about recovery_disabled, 
then it is not necessary for raid10.

Thanks,
Guoqing

.


To be honest, I am not sure if RAID10/RAID5 also has this problem. More 
time needed to debug.

Thanks a lot.
Yufen