Re: [PATCH -next v2] md: synchronize flush io with array reconfiguration

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Tue, 28 Nov 2023 10:12:26 +0800

Hi,

在 2023/11/28 7:32, Song Liu 写道:
On Mon, Nov 27, 2023 at 2:16 PM Song Liu <song@xxxxxxxxxx> wrote:

On Fri, Nov 24, 2023 at 10:54 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

Currently rcu is used to protect iterating rdev from submit_flushes():

submit_flushes                  remove_and_add_spares
                                 synchronize_rcu
                                 pers->hot_remove_disk()
  rcu_read_lock()
  rdev_for_each_rcu
   if (rdev->raid_disk >= 0)
                                 rdev->radi_disk = -1;
    atomic_inc(&rdev->nr_pending)
    rcu_read_unlock()
    bi = bio_alloc_bioset()
    bi->bi_end_io = md_end_flush
    bi->private = rdev
    submit_bio
    // issue io for removed rdev

Fix this problem by grabbing 'acive_io' before iterating rdev, make sure
that remove_and_add_spares() won't concurrent with submit_flushes().

Fixes: a2826aa92e2e ("md: support barrier requests on all personalities.")
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
Changes v2:
  - Add WARN_ON in case md_flush_request() is not called from
  md_handle_request() in future.

  drivers/md/md.c | 22 ++++++++++++++++------
  1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 86efc9c2ae56..2ffedc39edd6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -538,6 +538,9 @@ static void md_end_flush(struct bio *bio)
         rdev_dec_pending(rdev, mddev);

         if (atomic_dec_and_test(&mddev->flush_pending)) {
+               /* The pair is percpu_ref_tryget() from md_flush_request() */
+               percpu_ref_put(&mddev->active_io);
+
                 /* The pre-request flush has finished */
                 queue_work(md_wq, &mddev->flush_work);
         }
@@ -557,12 +560,8 @@ static void submit_flushes(struct work_struct *ws)
         rdev_for_each_rcu(rdev, mddev)
                 if (rdev->raid_disk >= 0 &&
                     !test_bit(Faulty, &rdev->flags)) {
-                       /* Take two references, one is dropped
-                        * when request finishes, one after
-                        * we reclaim rcu_read_lock
-                        */
                         struct bio *bi;
-                       atomic_inc(&rdev->nr_pending);
+
                         atomic_inc(&rdev->nr_pending);
                         rcu_read_unlock();
                         bi = bio_alloc_bioset(rdev->bdev, 0,
@@ -573,7 +572,6 @@ static void submit_flushes(struct work_struct *ws)
                         atomic_inc(&mddev->flush_pending);
                         submit_bio(bi);
                         rcu_read_lock();
-                       rdev_dec_pending(rdev, mddev);
                 }
         rcu_read_unlock();
         if (atomic_dec_and_test(&mddev->flush_pending))
@@ -626,6 +624,18 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio)
         /* new request after previous flush is completed */
         if (ktime_after(req_start, mddev->prev_flush_start)) {
                 WARN_ON(mddev->flush_bio);
+               /*
+                * Grab a reference to make sure mddev_suspend() will wait for
+                * this flush to be done.
+                *
+                * md_flush_reqeust() is called under md_handle_request() and
+                * 'active_io' is already grabbed, hence percpu_ref_tryget()
+                * won't fail, percpu_ref_tryget_live() can't be used because
+                * percpu_ref_kill() can be called by mddev_suspend()
+                * concurrently.
+                */
+               if (WARN_ON(percpu_ref_tryget(&mddev->active_io)))

This should be "if (!WARN_ON(..))", right?

Sorry for the mistake, this actually should be:

if (WARN_ON(!percpu_ref_tryget(...))

Song

+                       percpu_ref_get(&mddev->active_io);

Actually, we can just use percpu_ref_get(), no?

Yes, we can, but if someone else doesn't call md_flush_request() under
md_handle_request() in the fulture, there will be problem and
percpu_ref_get() can't catch this, do you think it'll make sense to
prevent such case?

Thanks,
Kuai


Thanks,
Song
.