On 1/29/19 4:42 PM, Jann Horn wrote: > On Wed, Jan 30, 2019 at 12:14 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> On 1/29/19 4:08 PM, Jann Horn wrote: >>> On Wed, Jan 30, 2019 at 12:06 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> On 1/29/19 4:03 PM, Jann Horn wrote: >>>>> On Tue, Jan 29, 2019 at 11:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>> On 1/29/19 3:44 PM, Jann Horn wrote: >>>>>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>>>> If we have fixed user buffers, we can map them into the kernel when we >>>>>>>> setup the io_context. That avoids the need to do get_user_pages() for >>>>>>>> each and every IO. >>>>>>>> >>>>>>>> To utilize this feature, the application must call io_uring_register() >>>>>>>> after having setup an io_uring context, passing in >>>>>>>> IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer >>>>>>>> to an iovec array, and the nr_args should contain how many iovecs the >>>>>>>> application wishes to map. >>>>>>>> >>>>>>>> If successful, these buffers are now mapped into the kernel, eligible >>>>>>>> for IO. To use these fixed buffers, the application must use the >>>>>>>> IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then >>>>>>>> set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len >>>>>>>> must point to somewhere inside the indexed buffer. >>>>>>>> >>>>>>>> The application may register buffers throughout the lifetime of the >>>>>>>> io_uring context. It can call io_uring_register() with >>>>>>>> IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of >>>>>>>> buffers, and then register a new set. The application need not >>>>>>>> unregister buffers explicitly before shutting down the io_uring context. > [...] >> Just folded in this incremental, which should fix all the issues outlined >> in your email. >> >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 7364feebafed..d42541357969 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c > [...] >> @@ -1602,6 +1605,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, >> >> static int io_sq_thread(void *data) >> { >> + struct io_uring_sqe lsqe[IO_IOPOLL_BATCH]; >> struct sqe_submit sqes[IO_IOPOLL_BATCH]; >> struct io_ring_ctx *ctx = data; >> struct mm_struct *cur_mm = NULL; >> @@ -1701,6 +1705,14 @@ static int io_sq_thread(void *data) >> i = 0; >> all_fixed = true; >> do { >> + /* >> + * Ensure sqe is stable between checking if we need >> + * user access, and actually importing the iovec >> + * further down the stack. >> + */ >> + memcpy(&lsqe[i], sqes[i].sqe, sizeof(lsqe[i])); >> + sqes[i].sqe = &lsqe[i]; >> + > > What if io_submit_sqe() gets an -EAGAIN from __io_submit_sqe() and > queues io_sq_wq_submit_work()? Could that cause io_sq_wq_submit_work() > to read from io_sq_thread()'s lsge[i] while io_sq_thread() is already > in the next loop iteration of the outer loop and has copied new data > into lsge[i]? Hmm yes. I think we'll need to both embed a pointer to an sqe into sqe_submit, and an actual sqe. The former can be used in the fast path, while the latter will be our copy for the offload path for io_sq_thread() and io_sq_wq_submit_work(). -- Jens Axboe