Hi Felix Moessbauer, Greetings! I used Syzkaller and found that there is KASAN: use-after-free Read in io_sq_offload_create in Linux-next tree - next-20240916. After bisection and the first bad commit is: " f011c9cf04c0 io_uring/sqpoll: do not allow pinning outside of cpuset " All detailed into can be found at: https://github.com/laifryiee/syzkaller_logs/tree/main/240917_135250_io_sq_offload_create Syzkaller repro code: https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.c Syzkaller repro syscall steps: https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.prog Syzkaller report: https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.report Kconfig(make olddefconfig): https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/kconfig_origin Bisect info: https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/bisect_info.log bzImage: https://github.com/laifryiee/syzkaller_logs/raw/main/240917_135250_io_sq_offload_create/bzImage_7083504315d64199a329de322fce989e1e10f4f7 Issue dmesg: https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/7083504315d64199a329de322fce989e1e10f4f7_dmesg.log " [ 23.564898] ================================================================== [ 23.565444] BUG: KASAN: use-after-free in io_sq_offload_create+0xcaa/0x11d0 [ 23.565971] Read of size 8 at addr ffff888036377898 by task repro/729 [ 23.566459] [ 23.566593] CPU: 0 UID: 0 PID: 729 Comm: repro Not tainted 6.11.0-next-20240916-7083504315d6 #1 [ 23.567271] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 23.568066] Call Trace: [ 23.568252] <TASK> [ 23.568417] dump_stack_lvl+0xea/0x150 [ 23.568718] print_report+0xce/0x610 [ 23.569001] ? io_sq_offload_create+0xcaa/0x11d0 [ 23.569340] ? kasan_addr_to_slab+0x11/0xb0 [ 23.569651] ? io_sq_offload_create+0xcaa/0x11d0 [ 23.569992] kasan_report+0xcc/0x110 [ 23.570277] ? io_sq_offload_create+0xcaa/0x11d0 [ 23.570621] kasan_check_range+0x3e/0x1c0 [ 23.570917] __kasan_check_read+0x15/0x20 [ 23.571212] io_sq_offload_create+0xcaa/0x11d0 [ 23.571540] ? __pfx_io_sq_offload_create+0x10/0x10 [ 23.571893] ? __pfx___lock_acquire+0x10/0x10 [ 23.572228] ? __this_cpu_preempt_check+0x21/0x30 [ 23.572580] ? lock_acquire.part.0+0x152/0x390 [ 23.572910] ? __this_cpu_preempt_check+0x21/0x30 [ 23.573254] ? lock_release+0x441/0x870 [ 23.573541] ? __pfx_lock_release+0x10/0x10 [ 23.573846] ? trace_lock_acquire+0x139/0x1b0 [ 23.574180] ? debug_smp_processor_id+0x20/0x30 [ 23.574524] ? rcu_is_watching+0x19/0xc0 [ 23.574826] ? __alloc_pages_noprof+0x517/0x710 [ 23.575171] ? __pfx___alloc_pages_noprof+0x10/0x10 [ 23.575526] ? mod_objcg_state+0x42c/0x9c0 [ 23.575838] ? lockdep_hardirqs_on+0x89/0x110 [ 23.576159] ? __sanitizer_cov_trace_switch+0x58/0xa0 [ 23.576534] ? policy_nodemask+0xf9/0x450 [ 23.576835] ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30 [ 23.577220] ? alloc_pages_mpol_noprof+0x35d/0x580 [ 23.577575] ? __pfx_alloc_pages_mpol_noprof+0x10/0x10 [ 23.577950] ? __kmalloc_node_noprof+0x3a3/0x4e0 [ 23.578302] ? __kvmalloc_node_noprof+0x7f/0x240 [ 23.578645] ? alloc_pages_noprof+0xa9/0x180 [ 23.578963] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 [ 23.579347] ? io_pages_map+0x244/0x5c0 [ 23.579631] io_uring_setup+0x18df/0x3950 [ 23.579936] ? __pfx_io_uring_setup+0x10/0x10 [ 23.580263] ? __audit_syscall_entry+0x39c/0x500 [ 23.580602] __x64_sys_io_uring_setup+0xa4/0x160 [ 23.580939] x64_sys_call+0x17f5/0x20d0 [ 23.581224] do_syscall_64+0x6d/0x140 [ 23.581498] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 23.581872] RIP: 0033:0x7efd9fa3ee5d [ 23.582140] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48 [ 23.583404] RSP: 002b:00007ffdd4400858 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9 [ 23.583938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efd9fa3ee5d [ 23.584430] RDX: 00007efd9fb3f247 RSI: 0000000020000080 RDI: 0000000000005230 [ 23.584927] RBP: 00007ffdd4400860 R08: 00007ffdd44002d0 R09: 00007ffdd4400890 [ 23.585419] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffdd44009b8 [ 23.585914] R13: 0000000000401730 R14: 0000000000403e08 R15: 00007efd9fcb5000 [ 23.586424] </TASK> [ 23.586589] [ 23.586709] The buggy address belongs to the physical page: [ 23.587094] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36377 [ 23.587644] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 23.588103] raw: 000fffffc0000000 ffffea0000d8ddc8 ffffea0000d8ddc8 0000000000000000 [ 23.588636] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 23.589167] page dumped because: kasan: bad access detected [ 23.589551] [ 23.589670] Memory state around the buggy address: [ 23.590007] ffff888036377780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 23.590514] ffff888036377800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 23.591011] >ffff888036377880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 23.591508] ^ [ 23.591794] ffff888036377900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 23.592292] ffff888036377980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 23.592789] ================================================================== [ 23.593344] Disabling lock debugging due to kernel taint " I hope you find it useful. Regards, Yi Lai --- If you don't need the following environment to reproduce the problem or if you already have one reproduced environment, please ignore the following information. How to reproduce: git clone https://gitlab.com/xupengfe/repro_vm_env.git cd repro_vm_env tar -xvf repro_vm_env.tar.gz cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0 // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel // You could change the bzImage_xxx as you want // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version You could use below command to log in, there is no password for root. ssh -p 10023 root@localhost After login vm(virtual machine) successfully, you could transfer reproduced binary to the vm by below way, and reproduce the problem in vm: gcc -pthread -o repro repro.c scp -P 10023 repro root@localhost:/root/ Get the bzImage for target kernel: Please use target kconfig and copy it to kernel_src/.config make olddefconfig make -jx bzImage //x should equal or less than cpu num your pc has Fill the bzImage file into above start3.sh to load the target kernel in vm. Tips: If you already have qemu-system-x86_64, please ignore below info. If you want to install qemu v7.1.0 version: git clone https://github.com/qemu/qemu.git cd qemu git checkout -f v7.1.0 mkdir build cd build yum install -y ninja-build.x86_64 yum -y install libslirp-devel.x86_64 ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp make make install On Mon, Sep 09, 2024 at 05:00:36PM +0200, Felix Moessbauer wrote: > The submit queue polling threads are userland threads that just never > exit to the userland. When creating the thread with IORING_SETUP_SQ_AFF, > the affinity of the poller thread is set to the cpu specified in > sq_thread_cpu. However, this CPU can be outside of the cpuset defined > by the cgroup cpuset controller. This violates the rules defined by the > cpuset controller and is a potential issue for realtime applications. > > In b7ed6d8ffd6 we fixed the default affinity of the poller thread, in > case no explicit pinning is required by inheriting the one of the > creating task. In case of explicit pinning, the check is more > complicated, as also a cpu outside of the parent cpumask is allowed. > We implemented this by using cpuset_cpus_allowed (that has support for > cgroup cpusets) and testing if the requested cpu is in the set. > > Fixes: 37d1e2e3642e ("io_uring: move SQPOLL thread io-wq forked worker") > Cc: stable@xxxxxxxxxxxxxxx # 6.1+ > Signed-off-by: Felix Moessbauer <felix.moessbauer@xxxxxxxxxxx> > --- > Hi, > > that's hopefully the last fix of cpu pinnings of the sq poller threads. > However, there is more to come on the io-wq side. E.g the syscalls for > IORING_REGISTER_IOWQ_AFF that can be used to change the affinites are > not yet protected. I'm currently just lacking good reproducers for that. > I also have to admit that I don't feel too comfortable making changes to > the wq part, given that I don't have good tests. > > While fixing this, I'm wondering if it makes sense to add tests for the > combination of pinning and cpuset. If yes, where should these tests be > added? > > Best regards, > Felix Moessbauer > Siemens AG > > io_uring/sqpoll.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c > index 713be7c29388..b8ec8fec99b8 100644 > --- a/io_uring/sqpoll.c > +++ b/io_uring/sqpoll.c > @@ -10,6 +10,7 @@ > #include <linux/slab.h> > #include <linux/audit.h> > #include <linux/security.h> > +#include <linux/cpuset.h> > #include <linux/io_uring.h> > > #include <uapi/linux/io_uring.h> > @@ -459,10 +460,12 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx, > return 0; > > if (p->flags & IORING_SETUP_SQ_AFF) { > + struct cpumask allowed_mask; > int cpu = p->sq_thread_cpu; > > ret = -EINVAL; > - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) > + cpuset_cpus_allowed(current, &allowed_mask); > + if (!cpumask_test_cpu(cpu, &allowed_mask)) > goto err_sqpoll; > sqd->sq_cpu = cpu; > } else { > -- > 2.39.2 >