On 13/12/2020 18:19, Keith Busch wrote: > On Fri, Dec 11, 2020 at 12:38:43PM +0000, Pavel Begunkov wrote: >> On 11/12/2020 03:37, Keith Busch wrote: >>> It sounds like the statistic is using the wrong criteria. It ought to >>> use the average time for the next available completion for any request >>> rather than the average latency of a specific IO. It might work at high >>> depth if the hybrid poll knew the hctx's depth when calculating the >>> sleep time, but that information doesn't appear to be readily available. >> >> It polls (and so sleeps) from submission of a request to its completion, >> not from request to request. > > Right, but the polling thread is responsible for completing all > requests, not just the most recent cookie. If the sleep timer uses the > round trip of a single request when you have a high queue depth, there > are likely to be many completions in the pipeline that aren't getting > polled on time. This feeds back to the mean latency, pushing the sleep > timer further out. It rather polls for a particular request and completes others by the way, and that's the problem. Completion-to-completion would make much more sense if we'd have a separate from waiters poll task. Or if the semantics would be not "poll for a request", but poll a file. And since io_uring IMHO that actually makes more sense even for non-hybrid polling. > >> Looks like the other scheme doesn't suit well >> when you don't have a constant-ish flow of requests, e.g. QD=1 and with >> different latency in the userspace. > > The idea I'm trying to convey shouldn't affect QD1. The following patch > seems to test "ok", but I know of at least a few scenarios where it > falls apart... > > --- > diff --git a/block/blk-mq.c b/block/blk-mq.c > index e9799fed98c7..cab2dafcd3a9 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3727,6 +3727,7 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb) > static unsigned long blk_mq_poll_nsecs(struct request_queue *q, > struct request *rq) > { > + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; > unsigned long ret = 0; > int bucket; > > @@ -3753,6 +3754,15 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q, > if (q->poll_stat[bucket].nr_samples) > ret = (q->poll_stat[bucket].mean + 1) / 2; > > + /* > + * Finding completions on the first poll indicates we're sleeping too > + * long and pushing the latency statistic in the wrong direction for > + * future sleep consideration. Poll immediately until the average time > + * becomes more useful. > + */ > + if (hctx->poll_invoked < 3 * hctx->poll_considered) > + return 0; > + > return ret; > } > > --- > -- Pavel Begunkov