Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request

Keith Busch <kbusch@xxxxxxxxxx> · Tue, 16 May 2023 08:47:46 -0600

On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote:
> On Mon, May 15, 2023 at 02:22:18PM -0600, Keith Busch wrote:
> > On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote:
> > > On 5/15/23 07:46, Ming Lei wrote:
> > > > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> > > >   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > >   {
> > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > >   		struct elevator_queue *e = rq->q->elevator;
> > > >   		if (e->type->ops.completed_request)
> > > > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > >   static inline void blk_mq_sched_requeue_request(struct request *rq)
> > > >   {
> > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > >   		struct request_queue *q = rq->q;
> > > >   		struct elevator_queue *e = q->elevator;
> > > 
> > > Has it been considered not to set RQF_ELV for passthrough requests instead
> > > of making the above changes?
> > 
> > That sounds like a good idea. It changes more behavior than what Ming is
> > targeting here, but after looking through each use for RQF_ELV, I think
> > not having that set really is the right thing to do in all cases for
> > passthrough requests.
> 
> I did consider that approach. But:
> 
> - RQF_ELV actually means that the request & its tag is allocated from sched tags.
> 
> - if RQF_ELV is cleared for passthrough request, request may be
>   allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
>   This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
>   Meantime it becomes not likely to optimize tags resource utilization in future,
>   at least for single LUN/NS, no need to keep sched tags & driver tags
>   in memory at the same time.

Isn't that similar to multiple namespaces where some use elevator and
others use 'none'? They're all contenting for the same shared driver
tags with racing 'has_requests()'.

And the passthrough case is special with users of that interface taking
on a greater responsibility and generally want the kernel out of the
way. I don't think anyone would purposefully run a tag intense workload
through that engine at the same time as using a generic one with the
scheduler. It definitely should still work, but it doesn't need to be
fair, right?