Re: [PATCH V4 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Mon, 11 Sep 2017 16:03:55 +0000

On Mon, 2017-09-11 at 19:10 +0800, Ming Lei wrote:
> @@ -787,6 +787,35 @@ int blk_queue_enter(struct request_queue *q, unsigned flags)
>  		if (percpu_ref_tryget_live(&q->q_usage_counter))
>  			return 0;
>  
> +		/*
> +		 * If queue is preempt frozen and caller need to allocate
> +		 * request for RQF_PREEMPT, we grab the .q_usage_counter
> +		 * unconditionally and return successfully.
> +		 *
> +		 * There isn't race with queue cleanup because:
> +		 *
> +		 * 1) it is guaranteed that preempt freeze can't be
> +		 * started after queue is set as dying
> +		 *
> +		 * 2) normal freeze runs exclusively with preempt
> +		 * freeze, so even after queue is set as dying
> +		 * afterwards, blk_queue_cleanup() won't move on
> +		 * until preempt freeze is done
> +		 *
> +		 * 3) blk_queue_dying() needn't to be checked here
> +		 * 	- for legacy path, it will be checked in
> +		 * 	__get_request()

For the legacy block layer core, what do you think will happen if the
"dying" state is set by another thread after __get_request() has passed the
blk_queue_dying() check?

> +		 * 	- blk-mq depends on driver to handle dying well
> +		 * 	because it is normal for queue to be set as dying
> +		 * 	just between blk_queue_enter() and allocating new
> +		 * 	request.

The above comment is not correct. The block layer core handles the "dying"
state. Block drivers other than dm-mpath should not have to query this state
directly.

> +		 */
> +		if ((flags & BLK_REQ_PREEMPT) &&
> +				blk_queue_is_preempt_frozen(q)) {
> +			blk_queue_enter_live(q);
> +			return 0;
> +		}
> +

Sorry but to me it looks like the above code introduces a race condition
between blk_queue_cleanup() and blk_get_request() for at least blk-mq.
Consider e.g. the following scenario:
* A first thread preempt-freezes a request queue.
* A second thread calls blk_get_request() with BLK_REQ_PREEMPT set. That
  results in a call of blk_queue_is_preempt_frozen().
* A context switch occurs to the first thread.
* The first thread preempt-unfreezes the same request queue and calls
  blk_queue_cleanup(). That last function changes the request queue state
  into DYING and waits until all pending requests have finished.
* The second thread continues and calls blk_queue_enter_live(), allocates
  a request and submits it.

In other words, a request gets submitted against a dying queue. This must
not happen. See also my explanation of queue shutdown from a few days ago
(https://marc.info/?l=linux-block&m=150449845831789).

Bart.