On 2020-05-10 21:08, Ming Lei wrote: > OK, just forgot the whole story, but the issue can be fixed quite easily > by adding a new request allocation flag in slow path, see the following > patch: > > diff --git a/block/blk-core.c b/block/blk-core.c > index ec50d7e6be21..d743be1b45a2 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -418,6 +418,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > if (success) > return 0; > > + if (flags & BLK_MQ_REQ_FORCE) { > + percpu_ref_get(ref); > + return 0; > + } > + > if (flags & BLK_MQ_REQ_NOWAIT) > return -EBUSY; > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index c2ea0a6e5b56..2816886d0bea 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -448,6 +448,13 @@ enum { > BLK_MQ_REQ_INTERNAL = (__force blk_mq_req_flags_t)(1 << 2), > /* set RQF_PREEMPT */ > BLK_MQ_REQ_PREEMPT = (__force blk_mq_req_flags_t)(1 << 3), > + > + /* > + * force to allocate request and caller has to make sure queue > + * won't be forzen completely during allocation, and this flag > + * is only applied after queue freeze is started > + */ > + BLK_MQ_REQ_FORCE = (__force blk_mq_req_flags_t)(1 << 4), > }; > > struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, I'm not sure that introducing such a flag is a good idea. After blk_mq_freeze_queue() has made it clear that a request queue must be frozen and before the request queue is really frozen, an RCU grace period must expire. Otherwise it cannot be guaranteed that the intention to freeze a request queue (by calling percpu_ref_kill()) has been observed by all potential blk_queue_enter() callers (blk_queue_enter() calls percpu_ref_tryget_live()). Not introducing any new race conditions would either require to introduce an smp_mb() call in blk_queue_enter() or to let another RCU grace period expire after the last allocation of a request with BLK_MQ_REQ_FORCE and before the request queue is really frozen. Serializing hardware queue quiescing and request queue freezing is probably a much simpler solution. I'm not sure of this but maybe holding the mq_freeze_lock mutex around hardware queue quiescing is sufficient. Thanks, Bart.