On Wed, Jan 30, 2019 at 12:14 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 1/29/19 4:08 PM, Jann Horn wrote: > > On Wed, Jan 30, 2019 at 12:06 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > >> On 1/29/19 4:03 PM, Jann Horn wrote: > >>> On Tue, Jan 29, 2019 at 11:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > >>>> On 1/29/19 3:44 PM, Jann Horn wrote: > >>>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > >>>>>> If we have fixed user buffers, we can map them into the kernel when we > >>>>>> setup the io_context. That avoids the need to do get_user_pages() for > >>>>>> each and every IO. > >>>>>> > >>>>>> To utilize this feature, the application must call io_uring_register() > >>>>>> after having setup an io_uring context, passing in > >>>>>> IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer > >>>>>> to an iovec array, and the nr_args should contain how many iovecs the > >>>>>> application wishes to map. > >>>>>> > >>>>>> If successful, these buffers are now mapped into the kernel, eligible > >>>>>> for IO. To use these fixed buffers, the application must use the > >>>>>> IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then > >>>>>> set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len > >>>>>> must point to somewhere inside the indexed buffer. > >>>>>> > >>>>>> The application may register buffers throughout the lifetime of the > >>>>>> io_uring context. It can call io_uring_register() with > >>>>>> IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of > >>>>>> buffers, and then register a new set. The application need not > >>>>>> unregister buffers explicitly before shutting down the io_uring context. [...] > Just folded in this incremental, which should fix all the issues outlined > in your email. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 7364feebafed..d42541357969 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c [...] > @@ -1602,6 +1605,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, > > static int io_sq_thread(void *data) > { > + struct io_uring_sqe lsqe[IO_IOPOLL_BATCH]; > struct sqe_submit sqes[IO_IOPOLL_BATCH]; > struct io_ring_ctx *ctx = data; > struct mm_struct *cur_mm = NULL; > @@ -1701,6 +1705,14 @@ static int io_sq_thread(void *data) > i = 0; > all_fixed = true; > do { > + /* > + * Ensure sqe is stable between checking if we need > + * user access, and actually importing the iovec > + * further down the stack. > + */ > + memcpy(&lsqe[i], sqes[i].sqe, sizeof(lsqe[i])); > + sqes[i].sqe = &lsqe[i]; > + What if io_submit_sqe() gets an -EAGAIN from __io_submit_sqe() and queues io_sq_wq_submit_work()? Could that cause io_sq_wq_submit_work() to read from io_sq_thread()'s lsge[i] while io_sq_thread() is already in the next loop iteration of the outer loop and has copied new data into lsge[i]?