Re: [PATCH 4/4] io_uring/register: add IORING_REGISTER_RESIZE_RINGS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/24/24 1:53 PM, Jann Horn wrote:
> On Thu, Oct 24, 2024 at 9:50?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 10/24/24 12:13 PM, Jann Horn wrote:
>>> On Thu, Oct 24, 2024 at 7:08?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>> Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize
>>>> the existing rings. It takes a struct io_uring_params argument, the same
>>>> one which is used to setup the ring initially, and resizes rings
>>>> according to the sizes given.
>>> [...]
>>>> +        * We'll do the swap. Clear out existing mappings to prevent mmap
>>>> +        * from seeing them, as we'll unmap them. Any attempt to mmap existing
>>>> +        * rings beyond this point will fail. Not that it could proceed at this
>>>> +        * point anyway, as we'll hold the mmap_sem until we've done the swap.
>>>> +        * Likewise, hold the completion * lock over the duration of the actual
>>>> +        * swap.
>>>> +        */
>>>> +       mmap_write_lock(current->mm);
>>>
>>> Why does the mmap lock for current->mm suffice here? I see nothing in
>>> io_uring_mmap() that limits mmap() to tasks with the same mm_struct.
>>
>> Ehm does ->mmap() not hold ->mmap_sem already? I was under that
>> understanding. Obviously if it doesn't, then yeah this won't be enough.
>> Checked, and it does.
>>
>> Ah I see what you mean now, task with different mm. But how would that
>> come about? The io_uring fd is CLOEXEC, and it can't get passed.
> 
> Yeah, that's what I meant, tasks with different mm. I think there are
> a few ways to get the io_uring fd into a different task, the ones I
> can immediately think of:
> 
>  - O_CLOEXEC only applies on execve(), fork() should still inherit the fd
>  - O_CLOEXEC can be cleared via fcntl()
>  - you can use clone() to create two tasks that share FD tables
> without sharing an mm

OK good catch, yes then it won't be enough. Might just make sense to
exclude mmap separately, then. Thanks, I'll work on that for v4!
Probably trivial enough with just a separate mutex that's held over
mmap, and that register grabs too. Basically replacing mmap_write_lock()
on the resize side with ctx->mmap_lock, and grabbing that over
validate/mmap on the io_uring mmap side. Need to ponder the implications
of that in terms of how the locks nest, as you'd have mmap_sem ->
ctx->mmap_lock, but the other way around for resizing. But that should
just be for the pinning parts, which is why it does the mmap_sem
grabbing where it does right now.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux