Hello, Bart. On Mon, Feb 05, 2018 at 09:33:03PM +0000, Bart Van Assche wrote: > My goal with this patch is to fix the race between resetting the timer and > the completion path. Hence change (3). Changes (1) and (2) are needed to > make the changes in blk_mq_rq_timed_out() work. Ah, I see. That makes sense. Can I ask you to elaborate the scenario you were fixing? > > > @@ -831,13 +834,12 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) > > > __blk_mq_complete_request(req); > > > break; > > > case BLK_EH_RESET_TIMER: > > > - /* > > > - * As nothing prevents from completion happening while > > > - * ->aborted_gstate is set, this may lead to ignored > > > - * completions and further spurious timeouts. > > > - */ > > > - blk_mq_rq_update_aborted_gstate(req, 0); > > > + local_irq_disable(); > > > + write_seqcount_begin(&req->gstate_seq); > > > blk_add_timer(req); > > > + req->aborted_gstate = 0; > > > + write_seqcount_end(&req->gstate_seq); > > > + local_irq_enable(); > > > break; > > > > So, this is #3 and I'm not sure how adding gstate_seq protection gets > > rid of the race condition mentioned in the comment. It's still the > > same that nothing is protecting against racing w/ completion. > > I think you are right. I will see whether I can rework this patch to address > that race. That race is harmless and has always been there tho. It only happens when the actual completion coincides with timeout expiring, which is very unlikely, and the only downside is that the completion gets lost and the request will get timed out down the line. It'd of course be better to close the race window but code simplicity likely is an important trade-off factor here. Thanks. -- tejun