Re: [PATCH 1/1] io_uring/sqpoll: do not allow pinning outside of cpuset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Felix Moessbauer,

Greetings!

I used Syzkaller and found that there is KASAN: use-after-free Read in io_sq_offload_create in Linux-next tree - next-20240916.

After bisection and the first bad commit is:
"
f011c9cf04c0 io_uring/sqpoll: do not allow pinning outside of cpuset
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/240917_135250_io_sq_offload_create
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/main/240917_135250_io_sq_offload_create/bzImage_7083504315d64199a329de322fce989e1e10f4f7
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/7083504315d64199a329de322fce989e1e10f4f7_dmesg.log

"
[   23.564898] ==================================================================
[   23.565444] BUG: KASAN: use-after-free in io_sq_offload_create+0xcaa/0x11d0
[   23.565971] Read of size 8 at addr ffff888036377898 by task repro/729
[   23.566459] 
[   23.566593] CPU: 0 UID: 0 PID: 729 Comm: repro Not tainted 6.11.0-next-20240916-7083504315d6 #1
[   23.567271] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   23.568066] Call Trace:
[   23.568252]  <TASK>
[   23.568417]  dump_stack_lvl+0xea/0x150
[   23.568718]  print_report+0xce/0x610
[   23.569001]  ? io_sq_offload_create+0xcaa/0x11d0
[   23.569340]  ? kasan_addr_to_slab+0x11/0xb0
[   23.569651]  ? io_sq_offload_create+0xcaa/0x11d0
[   23.569992]  kasan_report+0xcc/0x110
[   23.570277]  ? io_sq_offload_create+0xcaa/0x11d0
[   23.570621]  kasan_check_range+0x3e/0x1c0
[   23.570917]  __kasan_check_read+0x15/0x20
[   23.571212]  io_sq_offload_create+0xcaa/0x11d0
[   23.571540]  ? __pfx_io_sq_offload_create+0x10/0x10
[   23.571893]  ? __pfx___lock_acquire+0x10/0x10
[   23.572228]  ? __this_cpu_preempt_check+0x21/0x30
[   23.572580]  ? lock_acquire.part.0+0x152/0x390
[   23.572910]  ? __this_cpu_preempt_check+0x21/0x30
[   23.573254]  ? lock_release+0x441/0x870
[   23.573541]  ? __pfx_lock_release+0x10/0x10
[   23.573846]  ? trace_lock_acquire+0x139/0x1b0
[   23.574180]  ? debug_smp_processor_id+0x20/0x30
[   23.574524]  ? rcu_is_watching+0x19/0xc0
[   23.574826]  ? __alloc_pages_noprof+0x517/0x710
[   23.575171]  ? __pfx___alloc_pages_noprof+0x10/0x10
[   23.575526]  ? mod_objcg_state+0x42c/0x9c0
[   23.575838]  ? lockdep_hardirqs_on+0x89/0x110
[   23.576159]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[   23.576534]  ? policy_nodemask+0xf9/0x450
[   23.576835]  ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
[   23.577220]  ? alloc_pages_mpol_noprof+0x35d/0x580
[   23.577575]  ? __pfx_alloc_pages_mpol_noprof+0x10/0x10
[   23.577950]  ? __kmalloc_node_noprof+0x3a3/0x4e0
[   23.578302]  ? __kvmalloc_node_noprof+0x7f/0x240
[   23.578645]  ? alloc_pages_noprof+0xa9/0x180
[   23.578963]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[   23.579347]  ? io_pages_map+0x244/0x5c0
[   23.579631]  io_uring_setup+0x18df/0x3950
[   23.579936]  ? __pfx_io_uring_setup+0x10/0x10
[   23.580263]  ? __audit_syscall_entry+0x39c/0x500
[   23.580602]  __x64_sys_io_uring_setup+0xa4/0x160
[   23.580939]  x64_sys_call+0x17f5/0x20d0
[   23.581224]  do_syscall_64+0x6d/0x140
[   23.581498]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   23.581872] RIP: 0033:0x7efd9fa3ee5d
[   23.582140] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[   23.583404] RSP: 002b:00007ffdd4400858 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9
[   23.583938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efd9fa3ee5d
[   23.584430] RDX: 00007efd9fb3f247 RSI: 0000000020000080 RDI: 0000000000005230
[   23.584927] RBP: 00007ffdd4400860 R08: 00007ffdd44002d0 R09: 00007ffdd4400890
[   23.585419] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffdd44009b8
[   23.585914] R13: 0000000000401730 R14: 0000000000403e08 R15: 00007efd9fcb5000
[   23.586424]  </TASK>
[   23.586589] 
[   23.586709] The buggy address belongs to the physical page:
[   23.587094] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36377
[   23.587644] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[   23.588103] raw: 000fffffc0000000 ffffea0000d8ddc8 ffffea0000d8ddc8 0000000000000000
[   23.588636] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   23.589167] page dumped because: kasan: bad access detected
[   23.589551] 
[   23.589670] Memory state around the buggy address:
[   23.590007]  ffff888036377780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.590514]  ffff888036377800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.591011] >ffff888036377880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.591508]                             ^
[   23.591794]  ffff888036377900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.592292]  ffff888036377980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.592789] ==================================================================
[   23.593344] Disabling lock debugging due to kernel taint
"

I hope you find it useful.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.

Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install 

On Mon, Sep 09, 2024 at 05:00:36PM +0200, Felix Moessbauer wrote:
> The submit queue polling threads are userland threads that just never
> exit to the userland. When creating the thread with IORING_SETUP_SQ_AFF,
> the affinity of the poller thread is set to the cpu specified in
> sq_thread_cpu. However, this CPU can be outside of the cpuset defined
> by the cgroup cpuset controller. This violates the rules defined by the
> cpuset controller and is a potential issue for realtime applications.
> 
> In b7ed6d8ffd6 we fixed the default affinity of the poller thread, in
> case no explicit pinning is required by inheriting the one of the
> creating task. In case of explicit pinning, the check is more
> complicated, as also a cpu outside of the parent cpumask is allowed.
> We implemented this by using cpuset_cpus_allowed (that has support for
> cgroup cpusets) and testing if the requested cpu is in the set.
> 
> Fixes: 37d1e2e3642e ("io_uring: move SQPOLL thread io-wq forked worker")
> Cc: stable@xxxxxxxxxxxxxxx # 6.1+
> Signed-off-by: Felix Moessbauer <felix.moessbauer@xxxxxxxxxxx>
> ---
> Hi,
> 
> that's hopefully the last fix of cpu pinnings of the sq poller threads.
> However, there is more to come on the io-wq side. E.g the syscalls for
> IORING_REGISTER_IOWQ_AFF that can be used to change the affinites are
> not yet protected. I'm currently just lacking good reproducers for that.
> I also have to admit that I don't feel too comfortable making changes to
> the wq part, given that I don't have good tests.
> 
> While fixing this, I'm wondering if it makes sense to add tests for the
> combination of pinning and cpuset. If yes, where should these tests be
> added?
> 
> Best regards,
> Felix Moessbauer
> Siemens AG
> 
>  io_uring/sqpoll.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index 713be7c29388..b8ec8fec99b8 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -10,6 +10,7 @@
>  #include <linux/slab.h>
>  #include <linux/audit.h>
>  #include <linux/security.h>
> +#include <linux/cpuset.h>
>  #include <linux/io_uring.h>
>  
>  #include <uapi/linux/io_uring.h>
> @@ -459,10 +460,12 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
>  			return 0;
>  
>  		if (p->flags & IORING_SETUP_SQ_AFF) {
> +			struct cpumask allowed_mask;
>  			int cpu = p->sq_thread_cpu;
>  
>  			ret = -EINVAL;
> -			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> +			cpuset_cpus_allowed(current, &allowed_mask);
> +			if (!cpumask_test_cpu(cpu, &allowed_mask))
>  				goto err_sqpoll;
>  			sqd->sq_cpu = cpu;
>  		} else {
> -- 
> 2.39.2
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux