Re: hybrid polling on an nvme doesn't seem to work with iodepth > 1 on 5.10.0-rc5

Keith Busch <kbusch@xxxxxxxxxx> · Thu, 17 Dec 2020 07:22:29 +0900

On Mon, Dec 14, 2020 at 07:01:31PM +0000, Pavel Begunkov wrote:
> On 14/12/2020 18:23, Keith Busch wrote:
> > The existing block layer polling semantics doesn't poll for a specific
> > request. Please see the blk_mq_ops driver API for the 'poll' function.
> > It takes a hardware context, which does not indicate a specific request.
> > See also the blk_poll() function, which doesn't consider any specific
> > request in order to break out of the polling loop.
> 
> Yeah, thanks for pointing out, it's just the users do it that way --
> block layer dio and somewhat true for io_uring, and also hybrid part is
> per request based (and sleeps once per request), that stands out.
> If would go with coml-to-compl it should be changed. And not to forget
> that subm-to-compl sometimes is more desirable.

Right, so coming full circle to my initial reply: the block polling
thread may be responsible for multiple requests when it wakes up, yet
the hybrid sleep timer considers only one; therefore, the sleep criteria
is not always accurate and is worse than interrupt driven at high q
depth.

The current sleep calculation works fine for QD1, but I don't see a
clear way to calculate an accurate sleep time for higher q-depths within
a reasonable CPU cost. My only suggestion is just don't sleep at all as
long as the polling thread continues to reap completions on its first
poll.