On 12/15/2017 05:54 AM, Peter Zijlstra wrote: > On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote: >> On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote: >>> On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote: >>>> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote: >>>>> + write_seqcount_begin(&rq->gstate_seq); >>>>> + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT); >>>>> + blk_add_timer(rq); >>>>> + write_seqcount_end(&rq->gstate_seq); >>>> >>>> My understanding is that both write_seqcount_begin() and write_seqcount_end() >>>> trigger a write memory barrier. Is a seqcount really faster than a spinlock? >>> >>> Yes lots, no atomic operations and no waiting. >>> >>> The only constraint for write_seqlock is that there must not be any >>> concurrency. >>> >>> But now that I look at this again, TJ, why can't the below happen? >>> >>> write_seqlock_begin(); >>> blk_mq_rq_update_state(rq, IN_FLIGHT); >>> blk_add_timer(rq); >>> <timer-irq> >>> read_seqcount_begin() >>> while (seq & 1) >>> cpurelax(); >>> // life-lock >>> </timer-irq> >>> write_seqlock_end(); >> >> Hello Peter, >> >> Some time ago the block layer was changed to handle timeouts in thread context >> instead of interrupt context. See also commit 287922eb0b18 ("block: defer >> timeouts to a workqueue"). > > That only makes it a little better: > > Task-A Worker > > write_seqcount_begin() > blk_mq_rw_update_state(rq, IN_FLIGHT) > blk_add_timer(rq) > <timer> > schedule_work() > </timer> > <context-switch to worker> > read_seqcount_begin() > while(seq & 1) > cpu_relax(); > Hi Peter The current seqcount read side is as below: do { start = read_seqcount_begin(&rq->gstate_seq); gstate = READ_ONCE(rq->gstate); deadline = rq->deadline; } while (read_seqcount_retry(&rq->gstate_seq, start)); read_seqcount_retry() doesn't check the bit 0, but whether the saved value from read_seqcount_begin() is equal to the current value of seqcount. pls refer: static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start) { return unlikely(s->sequence != start); } Thanks Jianchao > > Now normally this isn't fatal because Worker will simply spin its entire > time slice away and we'll eventually schedule our Task-A back in, which > will complete the seqcount and things will work. > > But if, for some reason, our Worker was to have RT priority higher than > our Task-A we'd be up some creek without no paddles. > > We don't happen to have preemption of IRQs off here? That would fix > things nicely. >