Re: [Bug] double ->queue_rq() because of timeout in ->queue_rq()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 20, 2022 at 05:10:13PM +0800, Ming Lei wrote:
> @@ -1593,10 +1598,17 @@ static void blk_mq_timeout_work(struct work_struct *work)
>  	if (!percpu_ref_tryget(&q->q_usage_counter))
>  		return;
>  
> -	blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &next);
> +	/* Before walking tags, we must ensure any submit started before the
> +	 * current time has finished. Since the submit uses srcu or rcu, wait
> +	 * for a synchronization point to ensure all running submits have
> +	 * finished
> +	 */
> +	blk_mq_wait_quiesce_done(q);
> +
> +	blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &expired);

The blk_mq_wait_quiesce_done() will only wait for tasks that entered
just before calling that function. It will not wait for tasks that
entered immediately after.

If I correctly understand the problem you're describing, the hypervisor
may prevent any guest process from running. If so, the timeout work may
be stalled after the quiesce, and if a queue_rq() process also stalled
after starting quiesce_done(), then we're in the same situation you're
trying to prevent, right?

I agree with your idea that this is a lower level driver responsibility:
it should reclaim all started requests before allowing new queuing.
Perhaps the block layer should also raise a clear warning if it's
queueing a request that's already started.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux