Re: [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler

Damien Le Moal <dlemoal@xxxxxxxxxx> · Thu, 22 Jun 2023 10:19:42 +0900

On 6/22/23 05:12, Bart Van Assche wrote:
> Send requeued requests to the I/O scheduler if the dispatch order
> matters such that the I/O scheduler can control the order in which
> requests are dispatched.
> 
> This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
> to hctx dispatch list when requeue"). Instead of sending DONTPREP
> requests to the dispatch list, send these to the I/O scheduler and
> prevent that the I/O scheduler merges these requests by adding
> RQF_DONTPREP to the list of flags that prevent merging
> (RQF_NOMERGE_FLAGS).
> 
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Damien Le Moal <dlemoal@xxxxxxxxxx>
> Cc: Ming Lei <ming.lei@xxxxxxxxxx>
> Cc: Mike Snitzer <snitzer@xxxxxxxxxx>
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> ---
>  block/blk-mq.c         | 10 +++++-----
>  include/linux/blk-mq.h |  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f440e4aaaae3..453a90767f7a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1453,13 +1453,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	while (!list_empty(&requeue_list)) {
>  		rq = list_entry(requeue_list.next, struct request, queuelist);
>  		/*
> -		 * If RQF_DONTPREP ist set, the request has been started by the
> -		 * driver already and might have driver-specific data allocated
> -		 * already.  Insert it into the hctx dispatch list to avoid
> -		 * block layer merges for the request.
> +		 * Only send those RQF_DONTPREP requests to the dispatch list
> +		 * that may be reordered freely. If the request order matters,
> +		 * send the request to the I/O scheduler.
>  		 */
>  		list_del_init(&rq->queuelist);
> -		if (rq->rq_flags & RQF_DONTPREP)
> +		if (rq->rq_flags & RQF_DONTPREP &&
> +		    !op_needs_zoned_write_locking(req_op(rq)))

Why ? I still do not understand the need for this. There is always only a single
in-flight write per sequential zone. Requeuing that in-flight write directly to
the dispatch list will not reorder writes and it will be better for the command
latency.

>  			blk_mq_request_bypass_insert(rq, 0);
>  		else
>  			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index f401067ac03a..2610b299ec77 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -62,8 +62,8 @@ typedef __u32 __bitwise req_flags_t;
>  #define RQF_RESV		((__force req_flags_t)(1 << 23))
>  
>  /* flags that prevent us from merging requests: */
> -#define RQF_NOMERGE_FLAGS \
> -	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
> +#define RQF_NOMERGE_FLAGS                                               \
> +	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
>  
>  enum mq_rq_state {
>  	MQ_RQ_IDLE		= 0,

-- 
Damien Le Moal
Western Digital Research