Patch "md/raid10: prevent soft lockup while flush writes" has been added to the 5.15-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 23 Jul 2023 21:40:42 -0400

This is a note to let you know that I've just added the patch titled

    md/raid10: prevent soft lockup while flush writes

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     md-raid10-prevent-soft-lockup-while-flush-writes.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 1a70cd99fe5a4635ab5f50c5b4adb699b60660bf
Author: Yu Kuai <yukuai3@xxxxxxxxxx>
Date:   Mon May 29 21:11:00 2023 +0800

    md/raid10: prevent soft lockup while flush writes
    
    [ Upstream commit 010444623e7f4da6b4a4dd603a7da7469981e293 ]
    
    Currently, there is no limit for raid1/raid10 plugged bio. While flushing
    writes, raid1 has cond_resched() while raid10 doesn't, and too many
    writes can cause soft lockup.
    
    Follow up soft lockup can be triggered easily with writeback test for
    raid10 with ramdisks:
    
    watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
    Call Trace:
     <TASK>
     call_rcu+0x16/0x20
     put_object+0x41/0x80
     __delete_object+0x50/0x90
     delete_object_full+0x2b/0x40
     kmemleak_free+0x46/0xa0
     slab_free_freelist_hook.constprop.0+0xed/0x1a0
     kmem_cache_free+0xfd/0x300
     mempool_free_slab+0x1f/0x30
     mempool_free+0x3a/0x100
     bio_free+0x59/0x80
     bio_put+0xcf/0x2c0
     free_r10bio+0xbf/0xf0
     raid_end_bio_io+0x78/0xb0
     one_write_done+0x8a/0xa0
     raid10_end_write_request+0x1b4/0x430
     bio_endio+0x175/0x320
     brd_submit_bio+0x3b9/0x9b7 [brd]
     __submit_bio+0x69/0xe0
     submit_bio_noacct_nocheck+0x1e6/0x5a0
     submit_bio_noacct+0x38c/0x7e0
     flush_pending_writes+0xf0/0x240
     raid10d+0xac/0x1ed0
    
    Fix the problem by adding cond_resched() to raid10 like what raid1 did.
    
    Note that unlimited plugged bio still need to be optimized, for example,
    in the case of lots of dirty pages writeback, this will take lots of
    memory and io will spend a long time in plug, hence io latency is bad.
    
    Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
    Signed-off-by: Song Liu <song@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20230529131106.2123367-2-yukuai1@xxxxxxxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index edd3b65c447db..b71480ec26c79 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -903,6 +903,7 @@ static void flush_pending_writes(struct r10conf *conf)
 			else
 				submit_bio_noacct(bio);
 			bio = next;
+			cond_resched();
 		}
 		blk_finish_plug(&plug);
 	} else
@@ -1116,6 +1117,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 		else
 			submit_bio_noacct(bio);
 		bio = next;
+		cond_resched();
 	}
 	kfree(plug);
 }