On Mon, 2018-02-05 at 13:06 -0800, Tejun Heo wrote: > Thanks a lot for testing and fixing the issues but I'm a bit confused > by the patch. Maybe we can split patch a bit more? There seem to be > three things going on, > > 1. Changing preemption protection to irq protection in issue path. > > 2. Merge of aborted_gstate_sync and gstate_seq. > > 3. Updates to blk_mq_rq_timed_out(). > > Are all three changes necessary for stability? Hello Tejun, My goal with this patch is to fix the race between resetting the timer and the completion path. Hence change (3). Changes (1) and (2) are needed to make the changes in blk_mq_rq_timed_out() work. > > @@ -831,13 +834,12 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved) > > __blk_mq_complete_request(req); > > break; > > case BLK_EH_RESET_TIMER: > > - /* > > - * As nothing prevents from completion happening while > > - * ->aborted_gstate is set, this may lead to ignored > > - * completions and further spurious timeouts. > > - */ > > - blk_mq_rq_update_aborted_gstate(req, 0); > > + local_irq_disable(); > > + write_seqcount_begin(&req->gstate_seq); > > blk_add_timer(req); > > + req->aborted_gstate = 0; > > + write_seqcount_end(&req->gstate_seq); > > + local_irq_enable(); > > break; > > So, this is #3 and I'm not sure how adding gstate_seq protection gets > rid of the race condition mentioned in the comment. It's still the > same that nothing is protecting against racing w/ completion. I think you are right. I will see whether I can rework this patch to address that race. Thanks, Bart.