Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Tue, 10 Apr 2018 12:58:04 +0000

On Tue, 2018-04-10 at 16:41 +0800, Ming Lei wrote:
> On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote:
> > If a completion occurs after blk_mq_rq_timed_out() has reset
> > rq->aborted_gstate and the request is again in flight when the timeout
> 
> Given rq->aborted_gstate is reset only for BLK_EH_RESET_TIMER, I
> think you are addressing the race between normal completion and
> the timeout of BLK_EH_RESET_TIMER.

Yes, that's correct. I will make this more explicit.

> > expires then a request will be completed twice: a first time by the
> > timeout handler and a second time when the regular completion occurs.
> 
> This patch looks simpler, and more like the previous model of
> using blk_mark_rq_complete().

That's also correct.

> I have one question:
> 
> - with this patch, rq's state is updated atomically as cmpxchg. Suppose
> one rq is timed out, the rq's state is updated as MQ_RQ_COMPLETE by
> blk_mq_check_expired(), then ops->timeout() is run and
> BLK_EH_RESET_TIMER is returned, so blk_mq_add_timer(req, MQ_RQ_COMPLETE,
> MQ_RQ_IN_FLIGHT) is called to update rq's state into MQ_RQ_IN_FLIGHT.
> 
> Now the original normal completion still may occur after rq's state
> becomes MQ_RQ_IN_FLIGHT, and seems there is still the double completion
> with this patch? Maybe I am wrong, please explain a bit.

That scenario won't result in a double completion. After the timer has
been reset the block driver blk_mq_complete_request() call will change
the request state from MQ_RQ_IN_FLIGHT into MQ_RQ_COMPLETE. The next
time blk_mq_check_expired() is called it will execute the following code:

	blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE);

That function call only changes the request state if the current state is
IN_FLIGHT. However, the blk_mq_complete_request() call changed the request
state into COMPLETE. Hence, the above blk_mq_change_rq_state() call will
return false and the blk-mq timeout code will skip this request. If the
request would already have been reused and would have been marked again as
IN_FLIGHT then its deadline will also have been updated and the request
will be skipped by the timeout code because its deadline has not yet
expired.

> And synchronize_rcu() is needed by Tejun's approach between marking
> COMPLETE and handling this rq's timeout, and the time can be quite long,
> I guess it might be easier to trigger?

I have done what I could to trigger races between the regular completion
path and the timeout code in my tests. Without this patch if I run the
srp-test software I see crashes being reported in the rdma_rxe driver but
with this patch applied I don't see any crashes with my tests.

Bart.