On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: > rules. Unfortunatley, it contains quite a few holes. ^^^^^^^^^^^^^ Unfortunately? > While this change makes REQ_ATOM_COMPLETE synchornization unnecessary ^^^^^^^^^^^^^^^ synchronization? > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -126,6 +126,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq) > rq->start_time = jiffies; > set_start_time_ns(rq); > rq->part = NULL; > + seqcount_init(&rq->gstate_seq); > + u64_stats_init(&rq->aborted_gstate_sync); > } > EXPORT_SYMBOL(blk_rq_init); Sorry but the above change looks ugly to me. My understanding is that blk_rq_init() is only used inside the block layer to initialize legacy block layer requests while gstate_seq and aborted_gstate_sync are only relevant for blk-mq requests. Wouldn't it be better to avoid that blk_rq_init() is called for blk-mq requests such that the above change can be left out? The only callers outside the block layer core of blk_rq_init() I know of are ide_prep_sense() and scsi_ioctl_reset(). I can help with converting the SCSI code if you want. > + write_seqcount_begin(&rq->gstate_seq); > + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); > + blk_add_timer(rq); > + write_seqcount_end(&rq->gstate_seq); My understanding is that both write_seqcount_begin() and write_seqcount_end() trigger a write memory barrier. Is a seqcount really faster than a spinlock? > > @@ -792,6 +811,14 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved) > __blk_mq_complete_request(req); > break; > case BLK_EH_RESET_TIMER: > + /* > + * As nothing prevents from completion happening while > + * ->aborted_gstate is set, this may lead to ignored > + * completions and further spurious timeouts. > + */ > + u64_stats_update_begin(&req->aborted_gstate_sync); > + req->aborted_gstate = 0; > + u64_stats_update_end(&req->aborted_gstate_sync); If a blk-mq request is resubmitted 2**62 times, can that result in the above code setting aborted_gstate to the same value as gstate? Isn't that a bug? If so, how about setting aborted_gstate in the above code to e.g. gstate ^ (2**63)? > @@ -228,6 +230,27 @@ struct request { > > unsigned short write_hint; > > + /* > + * On blk-mq, the lower bits of ->gstate carry the MQ_RQ_* state > + * value and the upper bits the generation number which is > + * monotonically incremented and used to distinguish the reuse > + * instances. > + * > + * ->gstate_seq allows updates to ->gstate and other fields > + * (currently ->deadline) during request start to be read > + * atomically from the timeout path, so that it can operate on a > + * coherent set of information. > + */ > + seqcount_t gstate_seq; > + u64 gstate; > + > + /* > + * ->aborted_gstate is used by the timeout to claim a specific > + * recycle instance of this request. See blk_mq_timeout_work(). > + */ > + struct u64_stats_sync aborted_gstate_sync; > + u64 aborted_gstate; > + > unsigned long deadline; > struct list_head timeout_list; Why are gstate and aborted_gstate 64-bit variables? What makes you think that 32 bits would not be enough? Thanks, Bart.