Re: [PATCH] io_uring/net: cache provided buffer group value for multishot receives

Dylan Yudaken <dylany@xxxxxxxx> · Mon, 23 Jan 2023 14:04:18 +0000

On Sun, 2023-01-22 at 10:13 -0700, Jens Axboe wrote:
> If we're using ring provided buffers with multishot receive, and we
> end
> up doing an io-wq based issue at some points that also needs to
> select
> a buffer, we'll lose the initially assigned buffer group as
> io_ring_buffer_select() correctly clears the buffer group list as the
> issue isn't serialized by the ctx uring_lock. This is fine for normal
> receives as the request puts the buffer and finishes, but for
> multishot,
> we will re-arm and do further receives. On the next trigger for this
> multishot receive, the receive will try and pick from a buffer group
> whose value is the same as the buffer ID of the las receive. That is
> obviously incorrect, and will result in a premature -ENOUFS error for
> the receive even if we had available buffers in the correct group.
> 
> Cache the buffer group value at prep time, so we can restore it for
> future receives. This only needs doing for the above mentioned case,
> but
> just do it by default to keep it easier to read.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: b3fdea6ecb55 ("io_uring: multishot recv")
> Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
> Cc: Dylan Yudaken <dylany@xxxxxxxx>
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> 
> ---
> 
> diff --git a/io_uring/net.c b/io_uring/net.c
> index fbc34a7c2743..07a6aa39ab6f 100644
> --- a/io_uring/net.c
> +++ b/io_uring/net.c
> @@ -62,6 +62,7 @@ struct io_sr_msg {
>         u16                             flags;
>         /* initialised and used only by !msg send variants */
>         u16                             addr_len;
> +       u16                             buf_group;
>         void __user                     *addr;
>         /* used only for send zerocopy */
>         struct io_kiocb                 *notif;
> @@ -580,6 +581,15 @@ int io_recvmsg_prep(struct io_kiocb *req, const
> struct io_uring_sqe *sqe)
>                 if (req->opcode == IORING_OP_RECV && sr->len)
>                         return -EINVAL;
>                 req->flags |= REQ_F_APOLL_MULTISHOT;
> +               /*
> +                * Store the buffer group for this multishot receive
> separately,
> +                * as if we end up doing an io-wq based issue that
> selects a
> +                * buffer, it has to be committed immediately and
> that will
> +                * clear ->buf_list. This means we lose the link to
> the buffer
> +                * list, and the eventual buffer put on completion
> then cannot
> +                * restore it.
> +                */
> +               sr->buf_group = req->buf_index;
>         }
>  
>  #ifdef CONFIG_COMPAT
> @@ -816,8 +826,10 @@ int io_recvmsg(struct io_kiocb *req, unsigned
> int issue_flags)
>         if (kmsg->msg.msg_inq)
>                 cflags |= IORING_CQE_F_SOCK_NONEMPTY;
>  
> -       if (!io_recv_finish(req, &ret, cflags, mshot_finished,
> issue_flags))
> +       if (!io_recv_finish(req, &ret, cflags, mshot_finished,
> issue_flags)) {
> +               req->buf_index = sr->buf_group;

I think this is better placed in io_recv_prep_retry()? It would remove
the duplicated logic below

>                 goto retry_multishot;
> +       }
>  
>         if (mshot_finished) {
>                 /* fast path, check for non-NULL to avoid function
> call */
> @@ -918,8 +930,10 @@ int io_recv(struct io_kiocb *req, unsigned int
> issue_flags)
>         if (msg.msg_inq)
>                 cflags |= IORING_CQE_F_SOCK_NONEMPTY;
>  
> -       if (!io_recv_finish(req, &ret, cflags, ret <= 0,
> issue_flags))
> +       if (!io_recv_finish(req, &ret, cflags, ret <= 0,
> issue_flags)) {
> +               req->buf_index = sr->buf_group;
>                 goto retry_multishot;
> +       }
>  
>         return ret;
>  }
>