Re: [Bug] double ->queue_rq() because of timeout in ->queue_rq()

David Jeffery <djeffery@xxxxxxxxxx> · Fri, 21 Oct 2022 14:21:17 -0400

blk_On Fri, Oct 21, 2022 at 10:40 AM Keith Busch <kbusch@xxxxxxxxxx> wrote:
>
> The blk_mq_wait_quiesce_done() will only wait for tasks that entered
> just before calling that function. It will not wait for tasks that
> entered immediately after.

Right. The design was depending on saving a jiffies value before the
blk_mq_wait_quiesce_done() call and deadline behavior to avoid having
to care about queue_rq calls starting after
blk_mq_wait_quiesce_done().

> If I correctly understand the problem you're describing, the hypervisor
> may prevent any guest process from running. If so, the timeout work may
> be stalled after the quiesce, and if a queue_rq() process also stalled
> after starting quiesce_done(), then we're in the same situation you're
> trying to prevent, right?

For handling this condition and avoiding the issue, the code was
depending on the deadline value and rq->state value.
blk_mq_start_request sets a deadline then sets MQ_RQ_IN_FLIGHT, while
blk_mq_req_expired checks for MQ_RQ_IN_FLIGHT then checks the
deadline. A new queue_rq call would then not count as expired from
these checks from either its deadline being updated to be in the
future or from missing MQ_RQ_IN_FLIGHT .

However, there are no memory barriers ensuring this ordering. So it
looks like it should prevent an issue with a new queue_rq on x86 cpus
but could race on cpus with weaker memory ordering.

> I agree with your idea that this is a lower level driver responsibility:
> it should reclaim all started requests before allowing new queuing.
> Perhaps the block layer should also raise a clear warning if it's
> queueing a request that's already started.
>

If I understand what you are saying, that is roughly how the old scsi
timeout handling used to work before the async-abort changes. It
prevented new requests and drained everything (though HBA-wide not
just request_queue-wide) before doing anything with timed out
commands. The current implementation makes avoiding race conditions
more difficult but reduces the performance impact of a lone
misbehaving request.

So a driver under the block layer could potentially avoid the current
races by properly quiescing the request_queue before handling any
timed out requests. But I was trying to come up with a solution which
avoided that requirement.

David Jeffery