Re: [PATCH 4/4] io_uring/register: add IORING_REGISTER_RESIZE_RINGS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/24/24 3:08 PM, Jens Axboe wrote:
> On 10/24/24 2:32 PM, Jann Horn wrote:
>> On Thu, Oct 24, 2024 at 10:25?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>> On 10/24/24 2:08 PM, Jann Horn wrote:
>>>> On Thu, Oct 24, 2024 at 9:59?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>> On 10/24/24 1:53 PM, Jann Horn wrote:
>>>>>> On Thu, Oct 24, 2024 at 9:50?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>>>> On 10/24/24 12:13 PM, Jann Horn wrote:
>>>>>>>> On Thu, Oct 24, 2024 at 7:08?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>>>>>> Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize
>>>>>>>>> the existing rings. It takes a struct io_uring_params argument, the same
>>>>>>>>> one which is used to setup the ring initially, and resizes rings
>>>>>>>>> according to the sizes given.
>>>>>>>> [...]
>>>>>>>>> +        * We'll do the swap. Clear out existing mappings to prevent mmap
>>>>>>>>> +        * from seeing them, as we'll unmap them. Any attempt to mmap existing
>>>>>>>>> +        * rings beyond this point will fail. Not that it could proceed at this
>>>>>>>>> +        * point anyway, as we'll hold the mmap_sem until we've done the swap.
>>>>>>>>> +        * Likewise, hold the completion * lock over the duration of the actual
>>>>>>>>> +        * swap.
>>>>>>>>> +        */
>>>>>>>>> +       mmap_write_lock(current->mm);
>>>>>>>>
>>>>>>>> Why does the mmap lock for current->mm suffice here? I see nothing in
>>>>>>>> io_uring_mmap() that limits mmap() to tasks with the same mm_struct.
>>>>>>>
>>>>>>> Ehm does ->mmap() not hold ->mmap_sem already? I was under that
>>>>>>> understanding. Obviously if it doesn't, then yeah this won't be enough.
>>>>>>> Checked, and it does.
>>>>>>>
>>>>>>> Ah I see what you mean now, task with different mm. But how would that
>>>>>>> come about? The io_uring fd is CLOEXEC, and it can't get passed.
>>>>>>
>>>>>> Yeah, that's what I meant, tasks with different mm. I think there are
>>>>>> a few ways to get the io_uring fd into a different task, the ones I
>>>>>> can immediately think of:
>>>>>>
>>>>>>  - O_CLOEXEC only applies on execve(), fork() should still inherit the fd
>>>>>>  - O_CLOEXEC can be cleared via fcntl()
>>>>>>  - you can use clone() to create two tasks that share FD tables
>>>>>> without sharing an mm
>>>>>
>>>>> OK good catch, yes then it won't be enough. Might just make sense to
>>>>> exclude mmap separately, then. Thanks, I'll work on that for v4!
>>>>
>>>> Yeah, that sounds reasonable to me.
>>>
>>> Something like this should do it, it's really just replacing mmap_sem
>>> with a ring private lock. And since the ordering already had to deal
>>> with uring_lock vs mmap_sem ABBA issues, this should slot straight in as
>>> well.
>>
>> Looks good to me at a glance.
> 
> Great, thanks for checking Jann. In the first place as well, appreciate
> it.
> 
> FWIW, compiled and ran through the testing, looks fine so far here.

And also fwiw, I did write a test case for this, and it goes boom pretty
quickly without the patch, no issues with the patch. Sample output:

==================================================================
BUG: KASAN: slab-use-after-free in vm_insert_pages+0x634/0x73c
Read of size 8 at addr ffff0000d8a264e0 by task resize-rings.t/741

CPU: 5 UID: 1000 PID: 741 Comm: resize-rings.t Not tainted 6.12.0-rc4-00082-g0935537ea92a #7661
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0xd0/0xe0
 show_stack+0x14/0x1c
 dump_stack_lvl+0x68/0x8c
 print_report+0x16c/0x4c8
 kasan_report+0xa0/0xe0
 __asan_report_load8_noabort+0x1c/0x24
 vm_insert_pages+0x634/0x73c
 io_uring_mmap_pages+0x1d4/0x2d8
 io_uring_mmap+0x19c/0x1c0
 mmap_region+0x844/0x19e0
 do_mmap+0x5f4/0xb00
 vm_mmap_pgoff+0x164/0x2a0
 ksys_mmap_pgoff+0x2a8/0x3c0
 __arm64_sys_mmap+0xc8/0x140
 invoke_syscall+0x6c/0x260
 el0_svc_common.constprop.0+0x158/0x224
 do_el0_svc+0x3c/0x5c
 el0_svc+0x44/0xb4
 el0t_64_sync_handler+0x118/0x124
 el0t_64_sync+0x168/0x16c

Allocated by task 733:
 kasan_save_stack+0x28/0x4c
 kasan_save_track+0x1c/0x40
 kasan_save_alloc_info+0x3c/0x4c
 __kasan_kmalloc+0xac/0xb0
 __kmalloc_node_noprof+0x1b4/0x3f0
 __kvmalloc_node_noprof+0x68/0x134
 io_pages_map+0x50/0x448
 io_register_resize_rings+0x484/0x1498
 __arm64_sys_io_uring_register+0x780/0x1f3c
 invoke_syscall+0x6c/0x260
 el0_svc_common.constprop.0+0x158/0x224
 do_el0_svc+0x3c/0x5c
 el0_svc+0x44/0xb4
 el0t_64_sync_handler+0x118/0x124
 el0t_64_sync+0x168/0x16c

Freed by task 733:
 kasan_save_stack+0x28/0x4c
 kasan_save_track+0x1c/0x40
 kasan_save_free_info+0x48/0x94
 __kasan_slab_free+0x48/0x60
 kfree+0x120/0x494
 kvfree+0x34/0x40
 io_pages_unmap+0x1a4/0x308
 io_register_free_rings.isra.0+0x6c/0x168
 io_register_resize_rings+0xce4/0x1498
 __arm64_sys_io_uring_register+0x780/0x1f3c
 invoke_syscall+0x6c/0x260
 el0_svc_common.constprop.0+0x158/0x224
 do_el0_svc+0x3c/0x5c
 el0_svc+0x44/0xb4
 el0t_64_sync_handler+0x118/0x124
 el0t_64_sync+0x168/0x16c

The buggy address belongs to the object at ffff0000d8a264e0
 which belongs to the cache kmalloc-cg-8 of size 8
The buggy address is located 0 bytes inside of
 freed 8-byte region [ffff0000d8a264e0, ffff0000d8a264e8)

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux