On Tue, May 16, 2023 at 08:47:46AM -0600, Keith Busch wrote: > On Tue, May 16, 2023 at 09:20:55AM +0800, Ming Lei wrote: > > On Mon, May 15, 2023 at 02:22:18PM -0600, Keith Busch wrote: > > > On Mon, May 15, 2023 at 08:52:38AM -0700, Bart Van Assche wrote: > > > > On 5/15/23 07:46, Ming Lei wrote: > > > > > @@ -48,7 +53,7 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq, > > > > > static inline void blk_mq_sched_completed_request(struct request *rq, u64 now) > > > > > { > > > > > - if (rq->rq_flags & RQF_ELV) { > > > > > + if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) { > > > > > struct elevator_queue *e = rq->q->elevator; > > > > > if (e->type->ops.completed_request) > > > > > @@ -58,7 +63,7 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now) > > > > > static inline void blk_mq_sched_requeue_request(struct request *rq) > > > > > { > > > > > - if (rq->rq_flags & RQF_ELV) { > > > > > + if ((rq->rq_flags & RQF_ELV) && !blk_mq_bypass_sched(rq->cmd_flags)) { > > > > > struct request_queue *q = rq->q; > > > > > struct elevator_queue *e = q->elevator; > > > > > > > > Has it been considered not to set RQF_ELV for passthrough requests instead > > > > of making the above changes? > > > > > > That sounds like a good idea. It changes more behavior than what Ming is > > > targeting here, but after looking through each use for RQF_ELV, I think > > > not having that set really is the right thing to do in all cases for > > > passthrough requests. > > > > I did consider that approach. But: > > > > - RQF_ELV actually means that the request & its tag is allocated from sched tags. > > > > - if RQF_ELV is cleared for passthrough request, request may be > > allocated from sched tags(normal IO) and driver tags(passthrough) at the same time. > > This way may cause other problem, such as, breaking blk_mq_hctx_has_requests(). > > Meantime it becomes not likely to optimize tags resource utilization in future, > > at least for single LUN/NS, no need to keep sched tags & driver tags > > in memory at the same time. > > Isn't that similar to multiple namespaces where some use elevator and > others use 'none'? They're all contenting for the same shared driver > tags with racing 'has_requests()'. It is similar, but not same. So far, for each single request queue, we never support to allocate request/tag from both sched tags and driver tags at the same. If we clear RQF_ELV simply for pt request, at least: 1) blk_mq_hctx_has_requests() is broken since this function only checks sched tags in case of q->elevator 2) q->nr_requests may not be respected any more in case of q->elevator > And the passthrough case is special with users of that interface taking > on a greater responsibility and generally want the kernel out of the > way. I don't think anyone would purposefully run a tag intense workload > through that engine at the same time as using a generic one with the > scheduler. It definitely should still work, but it doesn't need to be > fair, right? I guess it may work, but question is that what we can get from this kind of big change? And I think this approach may be one following work if it is proved as useful. Thanks, Ming