Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers

Jann Horn <jannh@xxxxxxxxxx> · Wed, 30 Jan 2019 00:42:30 +0100

On Wed, Jan 30, 2019 at 12:14 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 1/29/19 4:08 PM, Jann Horn wrote:
> > On Wed, Jan 30, 2019 at 12:06 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
> >> On 1/29/19 4:03 PM, Jann Horn wrote:
> >>> On Tue, Jan 29, 2019 at 11:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
> >>>> On 1/29/19 3:44 PM, Jann Horn wrote:
> >>>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
> >>>>>> If we have fixed user buffers, we can map them into the kernel when we
> >>>>>> setup the io_context. That avoids the need to do get_user_pages() for
> >>>>>> each and every IO.
> >>>>>>
> >>>>>> To utilize this feature, the application must call io_uring_register()
> >>>>>> after having setup an io_uring context, passing in
> >>>>>> IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer
> >>>>>> to an iovec array, and the nr_args should contain how many iovecs the
> >>>>>> application wishes to map.
> >>>>>>
> >>>>>> If successful, these buffers are now mapped into the kernel, eligible
> >>>>>> for IO. To use these fixed buffers, the application must use the
> >>>>>> IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then
> >>>>>> set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len
> >>>>>> must point to somewhere inside the indexed buffer.
> >>>>>>
> >>>>>> The application may register buffers throughout the lifetime of the
> >>>>>> io_uring context. It can call io_uring_register() with
> >>>>>> IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of
> >>>>>> buffers, and then register a new set. The application need not
> >>>>>> unregister buffers explicitly before shutting down the io_uring context.
[...]
> Just folded in this incremental, which should fix all the issues outlined
> in your email.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 7364feebafed..d42541357969 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
[...]
> @@ -1602,6 +1605,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes,
>
>  static int io_sq_thread(void *data)
>  {
> +       struct io_uring_sqe lsqe[IO_IOPOLL_BATCH];
>         struct sqe_submit sqes[IO_IOPOLL_BATCH];
>         struct io_ring_ctx *ctx = data;
>         struct mm_struct *cur_mm = NULL;
> @@ -1701,6 +1705,14 @@ static int io_sq_thread(void *data)
>                 i = 0;
>                 all_fixed = true;
>                 do {
> +                       /*
> +                        * Ensure sqe is stable between checking if we need
> +                        * user access, and actually importing the iovec
> +                        * further down the stack.
> +                        */
> +                       memcpy(&lsqe[i], sqes[i].sqe, sizeof(lsqe[i]));
> +                       sqes[i].sqe = &lsqe[i];
> +

What if io_submit_sqe() gets an -EAGAIN from __io_submit_sqe() and
queues io_sq_wq_submit_work()? Could that cause io_sq_wq_submit_work()
to read from io_sq_thread()'s lsge[i] while io_sq_thread() is already
in the next loop iteration of the outer loop and has copied new data
into lsge[i]?