Re: [PATCH 2/2] io_uring: fix failed linkchain code logic

Pavel Begunkov <asml.silence@xxxxxxxxx> · Mon, 23 Aug 2021 12:02:15 +0100

On 8/23/21 4:25 AM, Hao Xu wrote:
> Given a linkchain like this:
> req0(link_flag)-->req1(link_flag)-->...-->reqn(no link_flag)
> 
> There is a problem:
>  - if some intermediate linked req like req1 's submittion fails, reqs
>    after it won't be cancelled.
> 
>    - sqpoll disabled: maybe it's ok since users can get the error info
>      of req1 and stop submitting the following sqes.
> 
>    - sqpoll enabled: definitely a problem, the following sqes will be
>      submitted in the next round.
> 
> The solution is to refactor the code logic to:
>  - if a linked req's submittion fails, just mark it and the head(if it
>    exists) as REQ_F_FAIL. Leverage req->result to indicate whether it
>    is failed or cancelled.
>  - submit or fail the whole chain when we come to the end of it.

This looks good to me, a couple of comments below.

> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx>
> ---
>  fs/io_uring.c | 61 +++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 45 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 44b1b2b58e6a..9ae8f2a5c584 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1776,8 +1776,6 @@ static void io_preinit_req(struct io_kiocb *req, struct io_ring_ctx *ctx)
>  	req->ctx = ctx;
>  	req->link = NULL;
>  	req->async_data = NULL;
> -	/* not necessary, but safer to zero */
> -	req->result = 0;

Please leave it. I'm afraid of leaking stack to userspace because
->result juggling looks prone to errors. And preinit is pretty cold
anyway.

[...]

>  
> @@ -6637,19 +6644,25 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  	ret = io_init_req(ctx, req, sqe);
>  	if (unlikely(ret)) {
>  fail_req:
> +		/* fail even hard links since we don't submit */
>  		if (link->head) {
> -			/* fail even hard links since we don't submit */
> -			io_req_complete_failed(link->head, -ECANCELED);
> -			link->head = NULL;
> +			req_set_fail(link->head);

I think it will be more reliable if we set head->result here, ...

if (!(link->head->flags & FAIL))
	link->head->result = -ECANCELED;

> -		ret = io_req_prep_async(req);
> -		if (unlikely(ret))
> -			goto fail_req;
> +		if (!(req->flags & REQ_F_FAIL)) {
> +			ret = io_req_prep_async(req);
> +			if (unlikely(ret)) {
> +				req->result = ret;
> +				req_set_fail(req);
> +				req_set_fail(link->head);

... and here (a helper?), ...

> +			}
> +		}
>  		trace_io_uring_link(ctx, req, head);
>  		link->last->link = req;
>  		link->last = req;
> @@ -6681,6 +6699,17 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  		if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
>  			link->head = req;
>  			link->last = req;
> +			/*
> +			 * we can judge a link req is failed or cancelled by if
> +			 * REQ_F_FAIL is set, but the head is an exception since
> +			 * it may be set REQ_F_FAIL because of other req's failure
> +			 * so let's leverage req->result to distinguish if a head
> +			 * is set REQ_F_FAIL because of its failure or other req's
> +			 * failure so that we can set the correct ret code for it.
> +			 * init result here to avoid affecting the normal path.
> +			 */
> +			if (!(req->flags & REQ_F_FAIL))
> +				req->result = 0;

... instead of delaying to this point. Just IMHO, it's easier to look
after the code when it's set on the spot, i.e. may be easy to screw/forget
something while changing bits around.

>  		} else {
>  			io_queue_sqe(req);
>  		}
> 

-- 
Pavel Begunkov