Patch "sbitmap: correct wake_batch recalculation to avoid potential IO hung" has been added to the 6.1-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 4 Mar 2023 21:38:13 -0500

This is a note to let you know that I've just added the patch titled

    sbitmap: correct wake_batch recalculation to avoid potential IO hung

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sbitmap-correct-wake_batch-recalculation-to-avoid-po.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 4a00cb15e9c25eb35323c49ad694ed7b63767bf4
Author: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx>
Date:   Tue Jan 17 04:50:59 2023 +0800

    sbitmap: correct wake_batch recalculation to avoid potential IO hung
    
    [ Upstream commit b5fcf7871acb7f9a3a8ed341a68bd86aba3e254a ]
    
    Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened")
    mentioned that in case of shared tags, there could be just one real
    active hctx(queue) because of lazy detection of tag idle. Then driver tag
    allocation may wait forever on this real active hctx(queue) if wake_batch
    is > hctx_max_depth where hctx_max_depth is available tags depth for the
    actve hctx(queue). However, the condition wake_batch > hctx_max_depth is
    not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only
    wake up one wait queue for each wake_batch even though there is only one
    waiter in the woken wait queue. After this, there is only one tag to free
    and wake_batch may not be reached anymore. Commit 180dccb0dba4f ("blk-mq:
    fix tag_get wait task can't be awakened") methioned that driver tag
    allocation may wait forever. Actually, the inactive hctx(queue) will be
    truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all
    to wake one waiter per wait queue to break the hung. But IO hung for 30
    seconds is also not acceptable. Set batch size to small enough that depth
    of the shared hctx(queue) is enough to wake up all of the queues like
    sbq_calc_wake_batch do to fix this potential IO hung.
    
    Although hctx_max_depth will be clamped to at least 4 while wake_batch
    recalculation does not do the clamp, the wake_batch will be always
    recalculated to 1 when hctx_max_depth <= 4.
    
    Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
    Reviewed-by: Jan Kara <jack@xxxxxxx>
    Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20230116205059.3821738-6-shikemeng@xxxxxxxxxxxxxxx
    Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index a7c3dc3d2d174..e918cd8695f14 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -464,13 +464,10 @@ void sbitmap_queue_recalculate_wake_batch(struct sbitmap_queue *sbq,
 					    unsigned int users)
 {
 	unsigned int wake_batch;
-	unsigned int min_batch;
 	unsigned int depth = (sbq->sb.depth + users - 1) / users;
 
-	min_batch = sbq->sb.depth >= (4 * SBQ_WAIT_QUEUES) ? 4 : 1;
-
 	wake_batch = clamp_val(depth / SBQ_WAIT_QUEUES,
-			min_batch, SBQ_WAKE_BATCH);
+			1, SBQ_WAKE_BATCH);
 
 	WRITE_ONCE(sbq->wake_batch, wake_batch);
 }