Re: [PATCH] kyber: introduce kyber_depth_updated()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021/2/5 3:47, Omar Sandoval wrote:
On Fri, Jan 22, 2021 at 01:06:36AM -0800, Yang Yang wrote:
Hang occurs when user changes the scheduler queue depth, by writing to
the 'nr_requests' sysfs file of that device.
This patch introduces kyber_depth_updated(), so that kyber can update its
internal state when queue depth changes.

Do you have a reproducer for this? It'd be useful to turn that into a
blktest.

I _think_ this fix is correct other than the comment below, but it'd be
helpful to have an explanation in the commit message of how exactly it
gets stuck without the fix.


The details of the environment that we found the problem are as follows:
 an eMMC block device
 total driver tags: 16
 default queue_depth: 32
 kqd->async_depth initialized in kyber_init_sched() with queue_depth=32

Then we change queue_depth to 256, by writing to the 'nr_requests' sysfs file.
kqd->async_depth don't be updated after queue_depth changes.
Now the value of async depth is too small for queue_depth=256, this may cause hang.

Signed-off-by: Yang Yang <yang.yang@xxxxxxxx>
---
  block/kyber-iosched.c | 28 ++++++++++++----------------
  1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index dc89199bc8c6..b64f80d3eaf3 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -353,19 +353,9 @@ static void kyber_timer_fn(struct timer_list *t)
  	}
  }
-static unsigned int kyber_sched_tags_shift(struct request_queue *q)
-{
-	/*
-	 * All of the hardware queues have the same depth, so we can just grab
-	 * the shift of the first one.
-	 */
-	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
-}
-
  static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
  {
  	struct kyber_queue_data *kqd;
-	unsigned int shift;
  	int ret = -ENOMEM;
  	int i;
@@ -400,9 +390,6 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
  		kqd->latency_targets[i] = kyber_latency_targets[i];
  	}
- shift = kyber_sched_tags_shift(q);
-	kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
-
  	return kqd;
err_buckets:
@@ -458,9 +445,18 @@ static void kyber_ctx_queue_init(struct kyber_ctx_queue *kcq)
  		INIT_LIST_HEAD(&kcq->rq_list[i]);
  }
-static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
+static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
  {
  	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
+	struct blk_mq_tags *tags = hctx->sched_tags;
+
+	kqd->async_depth = tags->bitmap_tags->sb.depth * KYBER_ASYNC_PERCENT / 100U;

This isn't equivalent to the old code. sbitmap::depth is the number of
bits in the whole sbitmap. 2^sbitmap::shift is the number of bits used
in a single word of the sbitmap. async_depth is the number of bits to
use from each word (via sbitmap_get_shallow()).

This is setting async_depth to a fraction of the entire size of the
sbitmap, which is probably greater than the size of a single word,
effectively disabling the async depth limiting.


I'll send a v2 with this fixed.

Thank you!





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux