Re: [PATCH V11 11/12] blk-mq: re-submit IO in case that hctx is inactive

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 14 May 2020 08:45:03 +0800

On Wed, May 13, 2020 at 08:03:13AM -0700, Bart Van Assche wrote:
> On 2020-05-13 05:21, Christoph Hellwig wrote:
> > Use of the BLK_MQ_REQ_FORCE is pretty bogus here..
> > 
> >> +	if (rq->rq_flags & RQF_PREEMPT)
> >> +		flags |= BLK_MQ_REQ_PREEMPT;
> >> +	if (reserved)
> >> +		flags |= BLK_MQ_REQ_RESERVED;
> >> +	/*
> >> +	 * Queue freezing might be in-progress, and wait freeze can't be
> >> +	 * done now because we have request not completed yet, so mark this
> >> +	 * allocation as BLK_MQ_REQ_FORCE for avoiding this allocation &
> >> +	 * freeze hung forever.
> >> +	 */
> >> +	flags |= BLK_MQ_REQ_FORCE;
> >> +
> >> +	/* avoid allocation failure by clearing NOWAIT */
> >> +	nrq = blk_get_request(rq->q, rq->cmd_flags & ~REQ_NOWAIT, flags);
> >> +	if (!nrq)
> >> +		return;
> > 
> > blk_get_request returns an ERR_PTR.
> > 
> > But I'd rather avoid the magic new BLK_MQ_REQ_FORCE hack when we can
> > just open code it and document what is going on:
> > 
> > static struct blk_mq_tags *blk_mq_rq_tags(struct request *rq)
> > {
> > 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
> > 
> > 	if (rq->q->elevator)
> > 		return hctx->sched_tags;
> > 	return hctx->tags;
> > }
> > 
> > static void blk_mq_resubmit_rq(struct request *rq)
> > {
> > 	struct blk_mq_alloc_data alloc_data = {
> > 		.cmd_flags	= rq->cmd_flags & ~REQ_NOWAIT;
> > 	};
> > 	struct request *nrq;
> > 
> > 	if (rq->rq_flags & RQF_PREEMPT)
> > 		alloc_data.flags |= BLK_MQ_REQ_PREEMPT;
> > 	if (blk_mq_tag_is_reserved(blk_mq_rq_tags(rq), rq->internal_tag))
> > 		alloc_data.flags |= BLK_MQ_REQ_RESERVED;
> > 
> > 	/*
> > 	 * We must still be able to finish a resubmission due to a hotplug
> > 	 * even even if a queue freeze is in progress.
> > 	 */
> > 	percpu_ref_get(&q->q_usage_counter);
> > 	nrq = blk_mq_get_request(rq->q, NULL, &alloc_data);
> > 	blk_queue_exit(q);
> > 
> > 	if (!nrq)
> > 		return; // XXX: warn?
> > 	if (nrq->q->mq_ops->initialize_rq_fn)
> > 		rq->mq_ops->initialize_rq_fn(nrq);
> > 
> > 	blk_rq_copy_request(nrq, rq);
> > 	...
> 
> I don't like this because the above code allows allocation of requests
> and tags while a request queue is frozen. I'm concerned that this will
> break code that assumes that no tags are allocated while a request queue
> is frozen. If a request queue has a single hardware queue with 64 tags,

The above code path will never be called for single hw queue.

> if the above code allocates tag 40 and if blk_mq_tag_update_depth()
> reduces the queue depth to 32, will nrq become a dangling pointer?

allocation for nrq is just like other normal allocation, and if
it doesn't work with blk_mq_tag_update_depth(), it must be a more
generic issue instead of relating with this specific use case.

The only difference is that 'nrq' will be allocated from a new active
hctx, so the two requests can co-exist and we needn't to worry deadlock.

thanks,
Ming