Re: [PATCH V2 2/2] blk-mq: make sure elevator callbacks aren't called for passthrough request

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 17 May 2023 11:26:32 +0800

On Tue, May 16, 2023 at 08:47:46AM -0600, Keith Busch wrote:
> On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote:
> > On Mon, May 15, 2023 at 02:22:18PM -0600, Keith Busch wrote:
> > > On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote:
> > > > On 5/15/23 07:46, Ming Lei wrote:
> > > > > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
> > > > >   static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > > >   {
> > > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > > >   		struct elevator_queue *e = rq->q->elevator;
> > > > >   		if (e->type->ops.completed_request)
> > > > > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
> > > > >   static inline void blk_mq_sched_requeue_request(struct request *rq)
> > > > >   {
> > > > > -	if (rq->rq_flags & RQF_ELV) {
> > > > > +	if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) {
> > > > >   		struct request_queue *q = rq->q;
> > > > >   		struct elevator_queue *e = q->elevator;
> > > > 
> > > > Has it been considered not to set RQF_ELV for passthrough requests instead
> > > > of making the above changes?
> > > 
> > > That sounds like a good idea. It changes more behavior than what Ming is
> > > targeting here, but after looking through each use for RQF_ELV, I think
> > > not having that set really is the right thing to do in all cases for
> > > passthrough requests.
> > 
> > I did consider that approach. But:
> > 
> > - RQF_ELV actually means that the request & its tag is allocated from sched tags.
> > 
> > - if RQF_ELV is cleared for passthrough request, request may be
> >   allocated from sched tags(normal IO) and driver tags(passthrough) at the same time.
> >   This way may cause other problem, such as, breaking blk_mq_hctx_has_requests().
> >   Meantime it becomes not likely to optimize tags resource utilization in future,
> >   at least for single LUN/NS, no need to keep sched tags & driver tags
> >   in memory at the same time.
> 
> Isn't that similar to multiple namespaces where some use elevator and
> others use 'none'? They're all contenting for the same shared driver
> tags with racing 'has_requests()'.

It is similar, but not same.

So far, for each single request queue, we never support to allocate
request/tag from both sched tags and driver tags at the same.

If we clear RQF_ELV simply for pt request, at least:

1) blk_mq_hctx_has_requests() is broken since this function only checks
sched tags in case of q->elevator

2) q->nr_requests may not be respected any more in case of q->elevator

> And the passthrough case is special with users of that interface taking
> on a greater responsibility and generally want the kernel out of the
> way. I don't think anyone would purposefully run a tag intense workload
> through that engine at the same time as using a generic one with the
> scheduler. It definitely should still work, but it doesn't need to be
> fair, right?

I guess it may work, but question is that what we can get from this kind
of big change? And I think this approach may be one following work if it is
proved as useful.

Thanks,
Ming