Re: [PATCH v2 2/2] block/mq-deadline: Fix the tag reservation code

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Sat, 7 Dec 2024 10:17:19 +0800

Hi, Bart

在 2024/05/10 1:01, Bart Van Assche 写道:
The current tag reservation code is based on a misunderstanding of the
meaning of data->shallow_depth. Fix the tag reservation code as follows:
* By default, do not reserve any tags for synchronous requests because
   for certain use cases reserving tags reduces performance. See also
   Harshit Mogalapalli, [bug-report] Performance regression with fio
   sequential-write on a multipath setup, 2024-03-07
   (https://lore.kernel.org/linux-block/5ce2ae5d-61e2-4ede-ad55-551112602401@xxxxxxxxxx/)
* Reduce min_shallow_depth to one because min_shallow_depth must be less
   than or equal any shallow_depth value.
* Scale dd->async_depth from the range [1, nr_requests] to [1,
   bits_per_sbitmap_word].

Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Damien Le Moal <dlemoal@xxxxxxxxxx>
Cc: Zhiguo Niu <zhiguo.niu@xxxxxxxxxx>
Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")
Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
---
  block/mq-deadline.c | 20 +++++++++++++++++---
  1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 94eede4fb9eb..acdc28756d9d 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -487,6 +487,20 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
  	return rq;
  }
  
+/*
+ * 'depth' is a number in the range 1..INT_MAX representing a number of
+ * requests. Scale it with a factor (1 << bt->sb.shift) / q->nr_requests since
+ * 1..(1 << bt->sb.shift) is the range expected by sbitmap_get_shallow().
+ * Values larger than q->nr_requests have the same effect as q->nr_requests.
+ */
+static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int qdepth)
+{
+	struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
+	const unsigned int nrr = hctx->queue->nr_requests;
+
+	return ((qdepth << bt->sb.shift) + nrr - 1) / nrr;
+}
+
  /*
   * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
   * function is used by __blk_mq_get_tag().
@@ -503,7 +517,7 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
  	 * Throttle asynchronous requests and writes such that these requests
  	 * do not block the allocation of synchronous requests.
  	 */
-	data->shallow_depth = dd->async_depth;
+	data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
  }
  
  /* Called by blk_mq_update_nr_requests(). */
@@ -513,9 +527,9 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
  	struct deadline_data *dd = q->elevator->elevator_data;
  	struct blk_mq_tags *tags = hctx->sched_tags;
  
-	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
+	dd->async_depth = q->nr_requests;

We're comparing v6.6 and v5.10 performance in downstream kernel, we
met a regression and bisect to this patch. And during review, I don't
understand the above change.

For example, dd->async_depth is nr_requests, then dd_to_word_depth()
will just return 1 << bt->sb.shift. Then nothing will be throttled.

The regression is a corner test case that unlikely to happen in real
world, I can share more if you're interested.

Thanks,
Kuai

  
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, 1);
  }
  
  /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */

.