Re: [Bug] double ->queue_rq() because of timeout in ->queue_rq()

Keith Busch <kbusch@xxxxxxxxxx> · Fri, 21 Oct 2022 08:32:31 -0600

On Thu, Oct 20, 2022 at 05:10:13PM +0800, Ming Lei wrote:
> @@ -1593,10 +1598,17 @@ static void blk_mq_timeout_work(struct work_struct *work)
>  	if (!percpu_ref_tryget(&q->q_usage_counter))
>  		return;
>  
> -	blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &next);
> +	/* Before walking tags, we must ensure any submit started before the
> +	 * current time has finished. Since the submit uses srcu or rcu, wait
> +	 * for a synchronization point to ensure all running submits have
> +	 * finished
> +	 */
> +	blk_mq_wait_quiesce_done(q);
> +
> +	blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &expired);

The blk_mq_wait_quiesce_done() will only wait for tasks that entered
just before calling that function. It will not wait for tasks that
entered immediately after.

If I correctly understand the problem you're describing, the hypervisor
may prevent any guest process from running. If so, the timeout work may
be stalled after the quiesce, and if a queue_rq() process also stalled
after starting quiesce_done(), then we're in the same situation you're
trying to prevent, right?

I agree with your idea that this is a lower level driver responsibility:
it should reclaim all started requests before allowing new queuing.
Perhaps the block layer should also raise a clear warning if it's
queueing a request that's already started.