Re: [PATCH v3 16/16] block/mq-deadline: Prioritize high-priority requests

Niklas Cassel <Niklas.Cassel@xxxxxxx> · Mon, 23 Aug 2021 07:36:36 +0000

On Fri, Aug 20, 2021 at 04:38:24PM -0700, Bart Van Assche wrote:
> On 8/20/21 4:05 PM, Niklas Cassel wrote:
> > Thank you for your patch!
> > I tested it, and it does solve my problem.
> 
> That's quick. Thanks!

Thank you for the patch!

> 
> > I've been thinking more about this problem.
> > The problem is seen on a SATA zoned drive.
> > 
> > These drives have mq-deadline set as default by the
> > blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE) call in
> > drivers/scsi/sd_zbc.c:sd_zbc_read_zones()
> > 
> > This triggers block/elevator.c:elevator_init_mq() to initialize
> > "mq-deadline" as default scheduler for these devices.
> > 
> > I think that the problem might because that drivers/scsi/sd_zbc.c
> > has created the request_queue and submitted requests, before the call
> > to elevator_init_mq() is done.
> > 
> > elevator_init_mq() will set q->elevator->type->ops, so once that is set,
> > blk_mq_free_request() will call e->type->ops.finish_request(rq),
> > regardless if the request was inserted through the recently initialized
> > scheduler or not.
> > 
> > While I'm perfectly happy with your fix, would it perhaps be possible
> > to do the fix in block/elevator.c instead, so that we don't need to
> > do the same type of check that you did, in each and every single
> > io scheduler?
> > 
> > Looking at block/elevator.c:elevator_init_mq(), it seems to do:
> > 
> > blk_mq_freeze_queue()
> > blk_mq_quiesce_queue()
> > 
> > blk_mq_init_sched()
> > 
> > blk_mq_unquiesce_queue()
> > blk_mq_unfreeze_queue()
> > 
> > This obviously isn't enough to avoid the bug that we are seeing,
> > but could perhaps a more general fix be to flush/wait until all
> > in-flight requests have completed, and then free them, and then
> > set q->elevator->type->ops. That way, all requests inserted after
> > the io scheduler has been initialized, will have gone through the
> > io scheduler. So all finish_request() calls should have a
> > matching insert_request() call. What do you think?
> 
> q->elevator is set from inside the I/O scheduler's init_sched callback and
> that callback is called with the request queue frozen. Freezing happens by
> calling blk_mq_freeze_queue() and that function waits until all previously
> submitted requests have finished. So I don't think that the race described
> above can happen.

I see.
I was mainly thinking that it should be possible to do a generic fix,
such that we eventually won't need a similar fix as yours in all the
different I/O schedulers.

However, looking at e.g. BFQ it does appear to have something similar
to your fix already:

#define RQ_BFQQ(rq)             ((rq)->elv.priv[1])

bfq_finish_requeue_request()
	struct bfq_queue *bfqq = RQ_BFQQ(rq);

	...

        if (!rq->elv.icq || !bfqq)
                return;

So your proposed fix should also be fine.

However, it does not apply on top of Torvalds master or Jens's for-next
branch because they both have reverted your cgroup support patch.

If you rebase your fix and send it out, I will be happy to send out
a Reviewed-by/Tested-by.

Kind regards,
Niklas