On Tue, 2017-09-26 at 06:59 +0800, Ming Lei wrote: > On Mon, Sep 25, 2017 at 01:29:24PM -0700, Bart Van Assche wrote: > > +int blk_queue_enter(struct request_queue *q, bool nowait, bool preempt) > > { > > while (true) { > > int ret; > > > > - if (percpu_ref_tryget_live(&q->q_usage_counter)) > > - return 0; > > + if (percpu_ref_tryget_live(&q->q_usage_counter)) { > > + /* > > + * Since setting the PREEMPT_ONLY flag is followed > > + * by a switch of q_usage_counter from per-cpu to > > + * atomic mode and back to per-cpu and since the > > + * switch to atomic mode uses call_rcu_sched(), it > > + * is not necessary to call smp_rmb() here. > > + */ > > rcu_read_lock is held only inside percpu_ref_tryget_live(). > > Without one explicit barrier(smp_mb) between getting the refcounter > and reading the preempt only flag, the two operations(writing to > refcounter and reading the flag) can be reordered, so > unfreeze/unfreeze may be completed before this IO is completed. Sorry but I disagree. I'm using RCU to achieve the same effect as a barrier and to move the cost of the barrier from the reader to the updater. See also Paul E. McKenney, Mathieu Desnoyers, Lai Jiangshan, and Josh Triplett, The RCU-barrier menagerie, LWN.net, November 12, 2013 (https://lwn.net/Articles/573497/). Bart.