Patch "md: fix deadlock between mddev_suspend and flush bio" has been added to the 6.10-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    md: fix deadlock between mddev_suspend and flush bio

to the 6.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     md-fix-deadlock-between-mddev_suspend-and-flush-bio.patch
and it can be found in the queue-6.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 56dec3dc820d9de862b45cb687ac8aeef2aee5a8
Author: Li Nan <linan122@xxxxxxxxxx>
Date:   Sun May 26 02:52:57 2024 +0800

    md: fix deadlock between mddev_suspend and flush bio
    
    [ Upstream commit 611d5cbc0b35a752e657a83eebadf40d814d006b ]
    
    Deadlock occurs when mddev is being suspended while some flush bio is in
    progress. It is a complex issue.
    
    T1. the first flush is at the ending stage, it clears 'mddev->flush_bio'
        and tries to submit data, but is blocked because mddev is suspended
        by T4.
    T2. the second flush sets 'mddev->flush_bio', and attempts to queue
        md_submit_flush_data(), which is already running (T1) and won't
        execute again if on the same CPU as T1.
    T3. the third flush inc active_io and tries to flush, but is blocked because
        'mddev->flush_bio' is not NULL (set by T2).
    T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc
        by T3.
    
      T1            T2              T3              T4
      (flush 1)     (flush 2)       (third 3)       (suspend)
      md_submit_flush_data
       mddev->flush_bio = NULL;
       .
       .            md_flush_request
       .             mddev->flush_bio = bio
       .             queue submit_flushes
       .             .
       .             .              md_handle_request
       .             .               active_io + 1
       .             .               md_flush_request
       .             .                wait !mddev->flush_bio
       .             .
       .             .                              mddev_suspend
       .             .                               wait !active_io
       .             .
       .             submit_flushes
       .             queue_work md_submit_flush_data
       .             //md_submit_flush_data is already running (T1)
       .
       md_handle_request
        wait resume
    
    The root issue is non-atomic inc/dec of active_io during flush process.
    active_io is dec before md_submit_flush_data is queued, and inc soon
    after md_submit_flush_data() run.
      md_flush_request
        active_io + 1
        submit_flushes
          active_io - 1
          md_submit_flush_data
            md_handle_request
            active_io + 1
              make_request
            active_io - 1
    
    If active_io is dec after md_handle_request() instead of within
    submit_flushes(), make_request() can be called directly intead of
    md_handle_request() in md_submit_flush_data(), and active_io will
    only inc and dec once in the whole flush process. Deadlock will be
    fixed.
    
    Additionally, the only difference between fixing the issue and before is
    that there is no return error handling of make_request(). But after
    previous patch cleaned md_write_start(), make_requst() only return error
    in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456,
    md/raid456: fix a deadlock for dm-raid456 while io concurrent with
    reshape)". Since dm always splits data and flush operation into two
    separate io, io size of flush submitted by dm always is 0, make_request()
    will not be called in md_submit_flush_data(). To prevent future
    modifications from introducing issues, add WARN_ON to ensure
    make_request() no error is returned in this context.
    
    Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration")
    Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
    Signed-off-by: Song Liu <song@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20240525185257.3896201-3-linan666@xxxxxxxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index aff9118ff6975..3a02b8903d626 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -550,13 +550,9 @@ static void md_end_flush(struct bio *bio)
 
 	rdev_dec_pending(rdev, mddev);
 
-	if (atomic_dec_and_test(&mddev->flush_pending)) {
-		/* The pair is percpu_ref_get() from md_flush_request() */
-		percpu_ref_put(&mddev->active_io);
-
+	if (atomic_dec_and_test(&mddev->flush_pending))
 		/* The pre-request flush has finished */
 		queue_work(md_wq, &mddev->flush_work);
-	}
 }
 
 static void md_submit_flush_data(struct work_struct *ws);
@@ -587,12 +583,8 @@ static void submit_flushes(struct work_struct *ws)
 			rcu_read_lock();
 		}
 	rcu_read_unlock();
-	if (atomic_dec_and_test(&mddev->flush_pending)) {
-		/* The pair is percpu_ref_get() from md_flush_request() */
-		percpu_ref_put(&mddev->active_io);
-
+	if (atomic_dec_and_test(&mddev->flush_pending))
 		queue_work(md_wq, &mddev->flush_work);
-	}
 }
 
 static void md_submit_flush_data(struct work_struct *ws)
@@ -617,8 +609,20 @@ static void md_submit_flush_data(struct work_struct *ws)
 		bio_endio(bio);
 	} else {
 		bio->bi_opf &= ~REQ_PREFLUSH;
-		md_handle_request(mddev, bio);
+
+		/*
+		 * make_requst() will never return error here, it only
+		 * returns error in raid5_make_request() by dm-raid.
+		 * Since dm always splits data and flush operation into
+		 * two separate io, io size of flush submitted by dm
+		 * always is 0, make_request() will not be called here.
+		 */
+		if (WARN_ON_ONCE(!mddev->pers->make_request(mddev, bio)))
+			bio_io_error(bio);;
 	}
+
+	/* The pair is percpu_ref_get() from md_flush_request() */
+	percpu_ref_put(&mddev->active_io);
 }
 
 /*




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux