Re: [PATCH] blk-mq: don't queue passthrough request into scheduler

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 12 May 2023 23:55:26 +0800

On Fri, May 12, 2023 at 09:43:32AM -0600, Jens Axboe wrote:
> On 5/12/23 9:34?AM, Ming Lei wrote:
> > On Fri, May 12, 2023 at 09:25:18AM -0600, Jens Axboe wrote:
> >> On 5/12/23 9:19?AM, Ming Lei wrote:
> >>> On Fri, May 12, 2023 at 09:08:54AM -0600, Jens Axboe wrote:
> >>>> On 5/12/23 9:03?AM, Ming Lei wrote:
> >>>>> Passthrough(pt) request shouldn't be queued to scheduler, especially some
> >>>>> schedulers(such as bfq) supposes that req->bio is always available and
> >>>>> blk-cgroup can be retrieved via bio.
> >>>>>
> >>>>> Sometimes pt request could be part of error handling, so it is better to always
> >>>>> queue it into hctx->dispatch directly.
> >>>>>
> >>>>> Fix this issue by queuing pt request from plug list to hctx->dispatch
> >>>>> directly.
> >>>>
> >>>> Why not just add the check to the BFQ insertion? That would be a lot
> >>>> more trivial and would not be poluting the core with this stuff.
> >>>
> >>> pt request is supposed to be issued to device directly, and we never
> >>> queue it to scheduler before 1c2d2fff6dc0 ("block: wire-up support for
> >>> passthrough plugging").
> >>>
> >>> some pt request might be part of error handling, and adding it to
> >>> scheduler could cause io hang.
> >>
> >> I'm not suggesting adding it to the scheduler, just having the bypass
> >> "add to dispatch" in a different spot.
> > 
> > Originally it is added to dispatch in blk_execute_rq_nowait() for each
> > request, but now we support plug for pt request, that is why I add the
> > bypass in blk_mq_dispatch_plug_list(), and just grab lock for each batch
> > given now blk_execute_rq_nowait() is fast path for nvme uring pt io feature.
> 
> We really have two types of passthrough - normal kind of IO, and
> potential error recovery etc IO. The former can plug just fine, and I
> don't think we should treat it differently. Might make more sense to
> just bypass plugging for error handling type of IO, or pt that doesn't
> transfer any data to avoid having a NULL bio inserted into the
> scheduler.

So far, I guess we can't distinguish the two types.

The simplest change could be the previous one[1] by not plugging request
of !rq->bio, but pt request with user IO still can be added to
scheduler, so the question is if pt request with user IO should be
queued to scheduler?

[1] https://lore.kernel.org/linux-block/CAGS2=Yob_Ud9A-aTu5hQt8+kW4cyrLX12hNJTrRkJYigFT-AmA@xxxxxxxxxxxxxx/T/#m8ac66a4294707cf1ba301ef2bd2a70f9a0e2c8cb

Thanks, 
Ming