On 5/12/23 9:34?AM, Ming Lei wrote: > On Fri, May 12, 2023 at 09:25:18AM -0600, Jens Axboe wrote: >> On 5/12/23 9:19?AM, Ming Lei wrote: >>> On Fri, May 12, 2023 at 09:08:54AM -0600, Jens Axboe wrote: >>>> On 5/12/23 9:03?AM, Ming Lei wrote: >>>>> Passthrough(pt) request shouldn't be queued to scheduler, especially some >>>>> schedulers(such as bfq) supposes that req->bio is always available and >>>>> blk-cgroup can be retrieved via bio. >>>>> >>>>> Sometimes pt request could be part of error handling, so it is better to always >>>>> queue it into hctx->dispatch directly. >>>>> >>>>> Fix this issue by queuing pt request from plug list to hctx->dispatch >>>>> directly. >>>> >>>> Why not just add the check to the BFQ insertion? That would be a lot >>>> more trivial and would not be poluting the core with this stuff. >>> >>> pt request is supposed to be issued to device directly, and we never >>> queue it to scheduler before 1c2d2fff6dc0 ("block: wire-up support for >>> passthrough plugging"). >>> >>> some pt request might be part of error handling, and adding it to >>> scheduler could cause io hang. >> >> I'm not suggesting adding it to the scheduler, just having the bypass >> "add to dispatch" in a different spot. > > Originally it is added to dispatch in blk_execute_rq_nowait() for each > request, but now we support plug for pt request, that is why I add the > bypass in blk_mq_dispatch_plug_list(), and just grab lock for each batch > given now blk_execute_rq_nowait() is fast path for nvme uring pt io feature. We really have two types of passthrough - normal kind of IO, and potential error recovery etc IO. The former can plug just fine, and I don't think we should treat it differently. Might make more sense to just bypass plugging for error handling type of IO, or pt that doesn't transfer any data to avoid having a NULL bio inserted into the scheduler. >> Let me take a look at it... Do we have a reproducer for this issue? > > Guang Wu and Yu Kuai should have, and I didn't succeed in reproducing > it by setting bfq & io.bfq.weight cgroup in my test VM. I didn't either, but most likely because all the pt testing I did was mapped IO. So there would be a bio there. -- Jens Axboe