Re: [PATCH 3/3] blk-mq: fix wait condition for tagset wait completed check

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Wed, 29 Jan 2025 15:24:52 +0530

On 1/28/25 10:04 PM, Daniel Wagner wrote:
> blk_mq_tagset_count_completed_reqs returns the number of completed
> requests. The only user of this function is
> blk_mq_tagset_wait_completed_request which wants to know how many
> request are not yet completed. Thus return the number of in flight
> requests and terminate the wait loop when there is no inflight request.
> 
> Fixes: f9934a80f91d ("blk-mq: introduce blk_mq_tagset_wait_completed_request()")
> Cc: Ming Lei <ming.lei@xxxxxxxxxx>
> Signed-off-by: Daniel Wagner <wagi@xxxxxxxxxx>
> ---
>  block/blk-mq-tag.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index b9f417d980b46d54b74dec8adcb5b04e6a78635c..3ce46afb65e3c3de9f11ca440bf0f335f21d0998 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -450,11 +450,11 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
>  }
>  EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
>  
> -static bool blk_mq_tagset_count_completed_rqs(struct request *rq, void *data)
> +static bool blk_mq_tagset_count_inflight_rqs(struct request *rq, void *data)
>  {
>  	unsigned *count = data;
>  
> -	if (blk_mq_request_completed(rq))
> +	if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)
>  		(*count)++;
>  	return true;
>  }
> @@ -472,7 +472,7 @@ void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset)
>  		unsigned count = 0;
>  
>  		blk_mq_tagset_busy_iter(tagset,
> -				blk_mq_tagset_count_completed_rqs, &count);
> +				blk_mq_tagset_count_inflight_rqs, &count);
>  		if (!count)
>  			break;
>  		msleep(5);
> 
I see that blk_mq_tagset_wait_completed_request() is called from nvme_cancel_tagset()
and nvme_cancel_admin_tagset(). And it seems to me that the intention here's to wait 
until each completed requests are freed (or change its state to MQ_RQ_IDLE). 

Looking at code, the nvme_cancel_xxx() first invokes blk_mq_tagset_busy_iter() which 
iterates through tagset and cancels each in-flight request and marks the request state
to MQ_RQ_COMPLETE. Next in blk_mq_tagset_wait_completed_request(), we wait for each
completed request state changed to anything but MQ_RQ_COMPLETE. The next state of the 
request would be naturally MQ_RQ_IDLE once that request is freed. So in blk_mq_tagset_
wait_completed_request(), essentially we wait for request state change from MQ_RQ_COMPLETE
to MQ_RQ_IDLE.

So now if the proposal is that blk_mq_tagset_wait_completed_request() has to wait only 
if there's any in-flight request then it seems this function would never need to wait 
and looks redundant because req->state would never be MQ_RQ_IN_FLIGHT as those would 
have been already changed to MQ_RQ_COMPLETE when nvme_cancel_xxx() invokes blk_mq_tagset_
busy_iter(ctrl->tagset, nvme_cancel_request, ctrl).

Having said that, I am not sure what was the real intention here, in nvme_cancel_xxx(), 
do we really need to wait only until in-flight requests are completed or synchronize with 
request's completion callback (i.e. wait until all completed requests are freed)? 

Thanks,
--Nilay