Hello, On Mon, Apr 02, 2018 at 09:31:34PM +0000, Bart Van Assche wrote: > > > > + * As nothing prevents from completion happening while > > > > + * ->aborted_gstate is set, this may lead to ignored completions > > > > + * and further spurious timeouts. > > > > + */ > > > > + if (rq->rq_flags & RQF_MQ_TIMEOUT_RESET) > > > > + blk_mq_rq_update_aborted_gstate(rq, 0); ... > I think it can happen that the above code sees that (rq->rq_flags & > RQF_MQ_TIMEOUT_RESET) != 0, that blk_mq_start_request() executes the > following code: > > blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); > blk_add_timer(rq); > > and that subsequently blk_mq_rq_update_aborted_gstate(rq, 0) is called, > which will cause the next completion to be lost. Is fixing one occurrence > of a race and reintroducing it in another code path really an improvement? I'm not following at all. How would blk_mq_start_request() get called on the request while it's still owned by the timeout handler? That gstate clearing is what transfers the ownership back to the non-timeout path. Thanks. -- tejun