Re: [PATCH] blk-mq: don't queue passthrough request into scheduler

Jens Axboe <axboe@xxxxxxxxx> · Fri, 12 May 2023 09:43:32 -0600

On 5/12/23 9:34?AM, Ming Lei wrote:
> On Fri, May 12, 2023 at 09:25:18AM -0600, Jens Axboe wrote:
>> On 5/12/23 9:19?AM, Ming Lei wrote:
>>> On Fri, May 12, 2023 at 09:08:54AM -0600, Jens Axboe wrote:
>>>> On 5/12/23 9:03?AM, Ming Lei wrote:
>>>>> Passthrough(pt) request shouldn't be queued to scheduler, especially some
>>>>> schedulers(such as bfq) supposes that req->bio is always available and
>>>>> blk-cgroup can be retrieved via bio.
>>>>>
>>>>> Sometimes pt request could be part of error handling, so it is better to always
>>>>> queue it into hctx->dispatch directly.
>>>>>
>>>>> Fix this issue by queuing pt request from plug list to hctx->dispatch
>>>>> directly.
>>>>
>>>> Why not just add the check to the BFQ insertion? That would be a lot
>>>> more trivial and would not be poluting the core with this stuff.
>>>
>>> pt request is supposed to be issued to device directly, and we never
>>> queue it to scheduler before 1c2d2fff6dc0 ("block: wire-up support for
>>> passthrough plugging").
>>>
>>> some pt request might be part of error handling, and adding it to
>>> scheduler could cause io hang.
>>
>> I'm not suggesting adding it to the scheduler, just having the bypass
>> "add to dispatch" in a different spot.
> 
> Originally it is added to dispatch in blk_execute_rq_nowait() for each
> request, but now we support plug for pt request, that is why I add the
> bypass in blk_mq_dispatch_plug_list(), and just grab lock for each batch
> given now blk_execute_rq_nowait() is fast path for nvme uring pt io feature.

We really have two types of passthrough - normal kind of IO, and
potential error recovery etc IO. The former can plug just fine, and I
don't think we should treat it differently. Might make more sense to
just bypass plugging for error handling type of IO, or pt that doesn't
transfer any data to avoid having a NULL bio inserted into the
scheduler.

>> Let me take a look at it... Do we have a reproducer for this issue?
> 
> Guang Wu and Yu Kuai should have, and I didn't succeed in reproducing
> it by setting bfq & io.bfq.weight cgroup in my test VM.

I didn't either, but most likely because all the pt testing I did was
mapped IO. So there would be a bio there.

-- 
Jens Axboe