Re: [PATCH V4 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 14 Sep 2017 00:48:09 +0800

On Tue, Sep 12, 2017 at 11:40:57AM +0800, Ming Lei wrote:
> On Mon, Sep 11, 2017 at 04:03:55PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-09-11 at 19:10 +0800, Ming Lei wrote:
> > > @@ -787,6 +787,35 @@ int blk_queue_enter(struct request_queue *q, unsigned flags)
> > >  		if (percpu_ref_tryget_live(&q->q_usage_counter))
> > >  			return 0;
> > >  
> > > +		/*
> > > +		 * If queue is preempt frozen and caller need to allocate
> > > +		 * request for RQF_PREEMPT, we grab the .q_usage_counter
> > > +		 * unconditionally and return successfully.
> > > +		 *
> > > +		 * There isn't race with queue cleanup because:
> > > +		 *
> > > +		 * 1) it is guaranteed that preempt freeze can't be
> > > +		 * started after queue is set as dying
> > > +		 *
> > > +		 * 2) normal freeze runs exclusively with preempt
> > > +		 * freeze, so even after queue is set as dying
> > > +		 * afterwards, blk_queue_cleanup() won't move on
> > > +		 * until preempt freeze is done
> > > +		 *
> > > +		 * 3) blk_queue_dying() needn't to be checked here
> > > +		 * 	- for legacy path, it will be checked in
> > > +		 * 	__get_request()
> > 
> > For the legacy block layer core, what do you think will happen if the
> > "dying" state is set by another thread after __get_request() has passed the
> > blk_queue_dying() check?
> 
> Without this patchset, block core still need to handle the above
> situation, so your question isn't related with this patchset.
> 
> Also q->queue_lock is required in both setting dying and checking
> dying in__get_request(). But the lock can be released in
> __get_request(), so it is possible to allocate one request after
> queue is set as dying, and the request can be dispatched to a
> dying queue too for legacy.
> 
> > 
> > > +		 * 	- blk-mq depends on driver to handle dying well
> > > +		 * 	because it is normal for queue to be set as dying
> > > +		 * 	just between blk_queue_enter() and allocating new
> > > +		 * 	request.
> > 
> > The above comment is not correct. The block layer core handles the "dying"
> > state. Block drivers other than dm-mpath should not have to query this state
> > directly.
> 
> If blk-mq doesn't query dying state, how does it know queue is dying
> and handle the state? Also blk-mq isn't different with legacy wrt.
> depending on driver to handle dying.
> 
> > 
> > > +		 */
> > > +		if ((flags & BLK_REQ_PREEMPT) &&
> > > +				blk_queue_is_preempt_frozen(q)) {
> > > +			blk_queue_enter_live(q);
> > > +			return 0;
> > > +		}
> > > +
> > 
> > Sorry but to me it looks like the above code introduces a race condition
> > between blk_queue_cleanup() and blk_get_request() for at least blk-mq.
> > Consider e.g. the following scenario:
> > * A first thread preempt-freezes a request queue.
> > * A second thread calls blk_get_request() with BLK_REQ_PREEMPT set. That
> >   results in a call of blk_queue_is_preempt_frozen().
> > * A context switch occurs to the first thread.
> > * The first thread preempt-unfreezes the same request queue and calls
> >   blk_queue_cleanup(). That last function changes the request queue state
> >   into DYING and waits until all pending requests have finished.
> > * The second thread continues and calls blk_queue_enter_live(), allocates
> >   a request and submits it.
> 
> OK, looks a race I don't think of, but it can be fixed easily by calling
> blk_queue_enter_live() with holding q->freeze_lock, and it won't
> cause performance issue too since it is in slow path.
> 
> For example, we can introduce the following code in blk_queue_enter():
> 
> 		if ((flags & BLK_REQ_PREEMPT) &&
> 				blk_queue_enter_preempt_freeze(q))
> 			return 0;
> 
> static inline bool blk_queue_enter_preempt_freeze(struct request_queue *q)
> {
> 	bool preempt_frozen;
> 
> 	spin_lock(&q->freeze_lock);
> 	preempt_frozen = q->preempt_freezing && !q->preempt_unfreezing;
> 	if (preempt_froze)
> 		blk_queue_enter_live(q);
> 	spin_unlock(&q->freeze_lock);
> 
> 	return preempt_frozen;
> }
> 
> > 
> > In other words, a request gets submitted against a dying queue. This must
> > not happen. See also my explanation of queue shutdown from a few days ago
> 
> That is not correct, think about why queue dead is checked in
> __blk_run_queue_uncond() instead of queue dying. We still need to
> submit requests to driver when queue is dying, and driver knows
> how to handle that.
> 
> > (https://marc.info/?l=linux-block&m=150449845831789).
> 
> > from (https://marc.info/?l=linux-block&m=150449845831789).
> >> Do you understand how request queue cleanup works? The algorithm used for
> >> request queue cleanup is as follows:
> >> * Set the DYING flag. This flag makes all later blk_get_request() calls
> >>   fail.
> 
> Your description isn't true for both legacy and blk-mq:
> 
> For legacy, as you see q->queue_lock can be released in
> __get_request(), at that time, the queue can be setting as dying.
> but the request can be allocated successfully, and be submitted
> to driver. This way has been there for long time.
> 
> For blk-mq, follows the way for allocating blk-mq request:
> 
>         ret = blk_queue_enter(q, ...);
>         if (ret)
>                 return ERR_PTR(ret);
>         rq = blk_mq_get_request(q, ...);
> 
> The setting queue dying can just happen between blk_queue_enter()
> and blk_mq_get_request(), so it is usual to see requests sent
> to driver after queue is dying.
> 
> We shouldn't worry about that, because either cleanup queue or setting
> dying is triggered by driver, and driver knows that queue is dying
> at that time, and they can handle the request well under this
> situation.
> 
> >> * Wait until all pending requests have finished.
> >> * Set the DEAD flag. For the traditional block layer, this flag causes
> >>   blk_run_queue() not to call .request_fn() anymore. For blk-mq it is
> >>   guaranteed in another way that .queue_rq() won't be called anymore after
> >>   this flag has been set.
> 
> That is true, but still not related with this patch.

Hi Bart,

Could you please let me know if your concern about race between
preempt freeze and blk_cleanup_queue() is addressed in my last
reply?

-- 
Ming