On Wed, Jul 18, 2018 at 03:45:15PM +0000, Bart Van Assche wrote: > On Wed, 2018-07-18 at 20:16 +0800, Ming Lei wrote: > > On Wed, Jul 18, 2018 at 7:49 AM, Bart Van Assche <bart.vanassche@xxxxxxx> wrote: > > > @@ -3801,8 +3778,11 @@ int blk_pre_runtime_suspend(struct request_queue *q) > > > if (!q->dev) > > > return ret; > > > > > > + blk_set_preempt_only(q); > > > + blk_freeze_queue_start(q); > > > + > > > spin_lock_irq(q->queue_lock); > > > - if (q->nr_pending) { > > > + if (!percpu_ref_is_zero(&q->q_usage_counter)) { > > > > This way can't work reliably because the percpu ref isn't in atomic mode > > yet after blk_freeze_queue_start() returns, then percpu_ref_is_zero() won't > > see accurate value of the counter, finally the device may be put down before > > in-flight requests are completed by hardware. > > Hello Ming, > > The blk_freeze_queue_start() implementation is as follows: > > void blk_freeze_queue_start(struct request_queue *q) > { > int freeze_depth; > > freeze_depth = atomic_inc_return(&q->mq_freeze_depth); > if (freeze_depth == 1) { > percpu_ref_kill(&q->q_usage_counter); > if (q->mq_ops) > blk_mq_run_hw_queues(q, false); > } > } > > From the documentation header in include/linux/percpu-refcount.h above > percpu_ref_kill(): > > * Switches @ref into atomic mode before gathering up the percpu counters > * and dropping the initial ref. IMO, the comment isn't accurate enough. Yes, the counter becomes atomic mode after percpu_ref_kill() returns, but the counter can't be retrieved accurately before the rcu confirmation is done. One extra get is done in __percpu_ref_switch_to_atomic(), and the put pair is run in percpu_ref_call_confirm_rcu(), which is scheduled via call_rcu_sched(). So once blk_freeze_queue_start() returns, percpu_ref_is_zero() won't return true only until the rcu confirmation is done. That means this approach may not put device down. Thanks, Ming