On Fri, Dec 11, 2020 at 12:38:43PM +0000, Pavel Begunkov wrote: > On 11/12/2020 03:37, Keith Busch wrote: > > It sounds like the statistic is using the wrong criteria. It ought to > > use the average time for the next available completion for any request > > rather than the average latency of a specific IO. It might work at high > > depth if the hybrid poll knew the hctx's depth when calculating the > > sleep time, but that information doesn't appear to be readily available. > > It polls (and so sleeps) from submission of a request to its completion, > not from request to request. Right, but the polling thread is responsible for completing all requests, not just the most recent cookie. If the sleep timer uses the round trip of a single request when you have a high queue depth, there are likely to be many completions in the pipeline that aren't getting polled on time. This feeds back to the mean latency, pushing the sleep timer further out. > Looks like the other scheme doesn't suit well > when you don't have a constant-ish flow of requests, e.g. QD=1 and with > different latency in the userspace. The idea I'm trying to convey shouldn't affect QD1. The following patch seems to test "ok", but I know of at least a few scenarios where it falls apart... --- diff --git a/block/blk-mq.c b/block/blk-mq.c index e9799fed98c7..cab2dafcd3a9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3727,6 +3727,7 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb) static unsigned long blk_mq_poll_nsecs(struct request_queue *q, struct request *rq) { + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; unsigned long ret = 0; int bucket; @@ -3753,6 +3754,15 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q, if (q->poll_stat[bucket].nr_samples) ret = (q->poll_stat[bucket].mean + 1) / 2; + /* + * Finding completions on the first poll indicates we're sleeping too + * long and pushing the latency statistic in the wrong direction for + * future sleep consideration. Poll immediately until the average time + * becomes more useful. + */ + if (hctx->poll_invoked < 3 * hctx->poll_considered) + return 0; + return ret; } ---