Re: [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers

Jens Axboe <axboe@xxxxxxxxx> · Tue, 29 Jan 2019 16:51:13 -0700

On 1/29/19 4:42 PM, Jann Horn wrote:
> On Wed, Jan 30, 2019 at 12:14 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 1/29/19 4:08 PM, Jann Horn wrote:
>>> On Wed, Jan 30, 2019 at 12:06 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>> On 1/29/19 4:03 PM, Jann Horn wrote:
>>>>> On Tue, Jan 29, 2019 at 11:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>>> On 1/29/19 3:44 PM, Jann Horn wrote:
>>>>>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>>>>> If we have fixed user buffers, we can map them into the kernel when we
>>>>>>>> setup the io_context. That avoids the need to do get_user_pages() for
>>>>>>>> each and every IO.
>>>>>>>>
>>>>>>>> To utilize this feature, the application must call io_uring_register()
>>>>>>>> after having setup an io_uring context, passing in
>>>>>>>> IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer
>>>>>>>> to an iovec array, and the nr_args should contain how many iovecs the
>>>>>>>> application wishes to map.
>>>>>>>>
>>>>>>>> If successful, these buffers are now mapped into the kernel, eligible
>>>>>>>> for IO. To use these fixed buffers, the application must use the
>>>>>>>> IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then
>>>>>>>> set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len
>>>>>>>> must point to somewhere inside the indexed buffer.
>>>>>>>>
>>>>>>>> The application may register buffers throughout the lifetime of the
>>>>>>>> io_uring context. It can call io_uring_register() with
>>>>>>>> IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of
>>>>>>>> buffers, and then register a new set. The application need not
>>>>>>>> unregister buffers explicitly before shutting down the io_uring context.
> [...]
>> Just folded in this incremental, which should fix all the issues outlined
>> in your email.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 7364feebafed..d42541357969 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
> [...]
>> @@ -1602,6 +1605,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes,
>>
>>  static int io_sq_thread(void *data)
>>  {
>> +       struct io_uring_sqe lsqe[IO_IOPOLL_BATCH];
>>         struct sqe_submit sqes[IO_IOPOLL_BATCH];
>>         struct io_ring_ctx *ctx = data;
>>         struct mm_struct *cur_mm = NULL;
>> @@ -1701,6 +1705,14 @@ static int io_sq_thread(void *data)
>>                 i = 0;
>>                 all_fixed = true;
>>                 do {
>> +                       /*
>> +                        * Ensure sqe is stable between checking if we need
>> +                        * user access, and actually importing the iovec
>> +                        * further down the stack.
>> +                        */
>> +                       memcpy(&lsqe[i], sqes[i].sqe, sizeof(lsqe[i]));
>> +                       sqes[i].sqe = &lsqe[i];
>> +
> 
> What if io_submit_sqe() gets an -EAGAIN from __io_submit_sqe() and
> queues io_sq_wq_submit_work()? Could that cause io_sq_wq_submit_work()
> to read from io_sq_thread()'s lsge[i] while io_sq_thread() is already
> in the next loop iteration of the outer loop and has copied new data
> into lsge[i]?

Hmm yes. I think we'll need to both embed a pointer to an sqe into
sqe_submit, and an actual sqe. The former can be used in the fast
path, while the latter will be our copy for the offload path for
io_sq_thread() and io_sq_wq_submit_work().

-- 
Jens Axboe