Re: [PATCH] md/raid10: Fix raid10 replace hang when new added disk faulty

alexwu <alexwu@xxxxxxxxxxxx> · Mon, 10 Sep 2018 15:10:37 +0800

Shaohua Li 於 2018-09-08 02:20 寫到:
On Fri, Sep 07, 2018 at 11:30:34AM +0800, alexwu wrote:
From: Alex Wu <alexwu@xxxxxxxxxxxx>

In raid10_sync_request(), we expect issue a bio with callback
end_sync_read(), and a bio with callback end_sync_write().

In normal case, we will add resync sectors into mddev->recovery_active
when raid10_sync_request() returned, and sub resync sectors from
mddev->recovery_active when end_sync_write() call end_sync_request().

If new added disk, which are replacing the old disk, is set faulty,
there is a race issue:
    1. In the first rcu protected section, resync thread did not 
detect
       that mreplace is set faulty and pass the condition.
    2. In the second rcu protected section, mreplace is set faulty.
    3. But, resync thread will prepare the read object first, and then
       check the write condition.
    4. It will find that mreplace is set faulty and do not have to
       prepare write object.
This cause we add resync sectors but never sub.

This issue can be easily reproduced by the following steps:
    mdadm -C /dev/md0 --assume-clean -l 10 -n 4 /dev/sd[abcd]
    mdadm /dev/md0 -a /dev/sde
    mdadm /dev/md0 --replace /dev/sdd
    sleep 1
    mdadm /dev/md0 -f /dev/sde

This issue can be fixed by checking the write condition before prepare
the read object in the second rcu protected section.

Reported-by: Alex Chen <alexchen@xxxxxxxxxxxx>
Reviewed-by: Allen Peng <allenpeng@xxxxxxxxxxxx>
Reviewed-by: BingJing Chang <bingjingc@xxxxxxxxxxxx>
Signed-off-by: Alex Wu <alexwu@xxxxxxxxxxxx>
---
 drivers/md/raid10.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 9818980..593d193 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3189,6 +3189,17 @@ static sector_t raid10_sync_request(struct 
mddev *mddev, sector_t sector_nr,
 						continue;
 					}
 				}
+
+				/* make sure we need to write
+				 * before we prepare read object
+				 */
+				if (test_bit(In_sync, &mrdev->flags) &&
+					(mreplace == NULL ||
+					 r10_bio->devs[1].repl_bio == NULL ||
+					 test_bit(Faulty, &mreplace->flags))) {
+					break;
+				}
+

Does this fix the problem completely? Faulty can be set at any time, 
for
example, after this check, then we will have a mismatch again. How 
about we
record the Faulty bit for mpreplace locally, then use it for both 
read/write
check.

Thanks,
Shaohua
 				bio = r10_bio->devs[0].bio;
 				bio->bi_next = biolist;
 				biolist = bio;
--
2.7.4


I think you are right, this patch might not fix the problem completely.
I will change another method and test it.
Thanks for your reply.

Best Regards,
Alex Wu