On Mon, Dec 14, 2020 at 05:58:56PM +0000, Pavel Begunkov wrote: > On 13/12/2020 18:19, Keith Busch wrote: > > On Fri, Dec 11, 2020 at 12:38:43PM +0000, Pavel Begunkov wrote: > >> On 11/12/2020 03:37, Keith Busch wrote: > >>> It sounds like the statistic is using the wrong criteria. It ought to > >>> use the average time for the next available completion for any request > >>> rather than the average latency of a specific IO. It might work at high > >>> depth if the hybrid poll knew the hctx's depth when calculating the > >>> sleep time, but that information doesn't appear to be readily available. > >> > >> It polls (and so sleeps) from submission of a request to its completion, > >> not from request to request. > > > > Right, but the polling thread is responsible for completing all > > requests, not just the most recent cookie. If the sleep timer uses the > > round trip of a single request when you have a high queue depth, there > > are likely to be many completions in the pipeline that aren't getting > > polled on time. This feeds back to the mean latency, pushing the sleep > > timer further out. > > It rather polls for a particular request and completes others by the way, > and that's the problem. Completion-to-completion would make much more > sense if we'd have a separate from waiters poll task. > > Or if the semantics would be not "poll for a request", but poll a file. > And since io_uring IMHO that actually makes more sense even for > non-hybrid polling. The existing block layer polling semantics doesn't poll for a specific request. Please see the blk_mq_ops driver API for the 'poll' function. It takes a hardware context, which does not indicate a specific request. See also the blk_poll() function, which doesn't consider any specific request in order to break out of the polling loop.