Patch "md: synchronize flush io with array reconfiguration" has been added to the 6.7-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 20 Jan 2024 19:26:59 -0500

This is a note to let you know that I've just added the patch titled

    md: synchronize flush io with array reconfiguration

to the 6.7-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     md-synchronize-flush-io-with-array-reconfiguration.patch
and it can be found in the queue-6.7 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 0f5de9729b68cec902c03327399b83d9bf9ef33a
Author: Yu Kuai <yukuai3@xxxxxxxxxx>
Date:   Wed Nov 29 10:02:34 2023 +0800

    md: synchronize flush io with array reconfiguration
    
    [ Upstream commit fa2bbff7b0b4e211fec5e5686ef96350690597b5 ]
    
    Currently rcu is used to protect iterating rdev from submit_flushes():
    
    submit_flushes                  remove_and_add_spares
                                    synchronize_rcu
                                    pers->hot_remove_disk()
     rcu_read_lock()
     rdev_for_each_rcu
      if (rdev->raid_disk >= 0)
                                    rdev->radi_disk = -1;
       atomic_inc(&rdev->nr_pending)
       rcu_read_unlock()
       bi = bio_alloc_bioset()
       bi->bi_end_io = md_end_flush
       bi->private = rdev
       submit_bio
       // issue io for removed rdev
    
    Fix this problem by grabbing 'acive_io' before iterating rdev, make sure
    that remove_and_add_spares() won't concurrent with submit_flushes().
    
    Fixes: a2826aa92e2e ("md: support barrier requests on all personalities.")
    Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
    Signed-off-by: Song Liu <song@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20231129020234.1586910-1-yukuai1@xxxxxxxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9bdd57324c37..f246bb0932b0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -543,6 +543,9 @@ static void md_end_flush(struct bio *bio)
 	rdev_dec_pending(rdev, mddev);
 
 	if (atomic_dec_and_test(&mddev->flush_pending)) {
+		/* The pair is percpu_ref_get() from md_flush_request() */
+		percpu_ref_put(&mddev->active_io);
+
 		/* The pre-request flush has finished */
 		queue_work(md_wq, &mddev->flush_work);
 	}
@@ -562,12 +565,8 @@ static void submit_flushes(struct work_struct *ws)
 	rdev_for_each_rcu(rdev, mddev)
 		if (rdev->raid_disk >= 0 &&
 		    !test_bit(Faulty, &rdev->flags)) {
-			/* Take two references, one is dropped
-			 * when request finishes, one after
-			 * we reclaim rcu_read_lock
-			 */
 			struct bio *bi;
-			atomic_inc(&rdev->nr_pending);
+
 			atomic_inc(&rdev->nr_pending);
 			rcu_read_unlock();
 			bi = bio_alloc_bioset(rdev->bdev, 0,
@@ -578,7 +577,6 @@ static void submit_flushes(struct work_struct *ws)
 			atomic_inc(&mddev->flush_pending);
 			submit_bio(bi);
 			rcu_read_lock();
-			rdev_dec_pending(rdev, mddev);
 		}
 	rcu_read_unlock();
 	if (atomic_dec_and_test(&mddev->flush_pending))
@@ -631,6 +629,18 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio)
 	/* new request after previous flush is completed */
 	if (ktime_after(req_start, mddev->prev_flush_start)) {
 		WARN_ON(mddev->flush_bio);
+		/*
+		 * Grab a reference to make sure mddev_suspend() will wait for
+		 * this flush to be done.
+		 *
+		 * md_flush_reqeust() is called under md_handle_request() and
+		 * 'active_io' is already grabbed, hence percpu_ref_is_zero()
+		 * won't pass, percpu_ref_tryget_live() can't be used because
+		 * percpu_ref_kill() can be called by mddev_suspend()
+		 * concurrently.
+		 */
+		WARN_ON(percpu_ref_is_zero(&mddev->active_io));
+		percpu_ref_get(&mddev->active_io);
 		mddev->flush_bio = bio;
 		bio = NULL;
 	}