On Mon, May 11, 2020 at 01:52:14PM -0700, Bart Van Assche wrote: > On 2020-05-10 21:08, Ming Lei wrote: > > OK, just forgot the whole story, but the issue can be fixed quite easily > > by adding a new request allocation flag in slow path, see the following > > patch: > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index ec50d7e6be21..d743be1b45a2 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -418,6 +418,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > if (success) > > return 0; > > > > + if (flags & BLK_MQ_REQ_FORCE) { > > + percpu_ref_get(ref); > > + return 0; > > + } > > + > > if (flags & BLK_MQ_REQ_NOWAIT) > > return -EBUSY; > > > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > > index c2ea0a6e5b56..2816886d0bea 100644 > > --- a/include/linux/blk-mq.h > > +++ b/include/linux/blk-mq.h > > @@ -448,6 +448,13 @@ enum { > > BLK_MQ_REQ_INTERNAL = (__force blk_mq_req_flags_t)(1 << 2), > > /* set RQF_PREEMPT */ > > BLK_MQ_REQ_PREEMPT = (__force blk_mq_req_flags_t)(1 << 3), > > + > > + /* > > + * force to allocate request and caller has to make sure queue > > + * won't be forzen completely during allocation, and this flag > > + * is only applied after queue freeze is started > > + */ > > + BLK_MQ_REQ_FORCE = (__force blk_mq_req_flags_t)(1 << 4), > > }; > > > > struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, > > I'm not sure that introducing such a flag is a good idea. After > blk_mq_freeze_queue() has made it clear that a request queue must be > frozen and before the request queue is really frozen, an RCU grace > period must expire. Otherwise it cannot be guaranteed that the intention > to freeze a request queue (by calling percpu_ref_kill()) has been > observed by all potential blk_queue_enter() callers (blk_queue_enter() > calls percpu_ref_tryget_live()). Not introducing any new race conditions > would either require to introduce an smp_mb() call in blk_queue_enter() > or to let another RCU grace period expire after the last allocation of a > request with BLK_MQ_REQ_FORCE and before the request queue is really frozen. Actually neither smp_mb() or extra grace period is needed, and it can be explained in the following way simply: percpu_ref_get() -> percpu_ref_get_many() is introduced by BLK_MQ_REQ_FORCE. When percpu_ref_get() is called: - if it is still in percpu mode, it will be covered by the rcu grace period in percpu_ref_kill_and_confirm(). - otherwise, the refcount is grabbed in atomic mode, no extra smp_mb() or rcu period required because we guarantee that the atomic number is > 1 when calling percpu_ref_get(). And blk_mq_freeze_queue_wait() will observe correct value of this atomic refcount. percpu_ref_get() is documented as : * This function is safe to call as long as @ref is between init and exit. Thanks, Ming