Hi Peter and perf experts, There is "BUG: using smp_processor_id() in preemptible code in event_function_call" in v6.11-rc3 Intel internal with linux-next patches in syzkaller testing. And this issue was reproduced in next-20240812 also. Bisected and found related commit in Intel internal repo: 558abc7e3f89 perf: Fix event_function_call() locking After revert above commit on top of Intel internal kernel, this issue was gone. All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240812_194918_event_function_call Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/repro.c Syzkaller repro syscall steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/repro.prog Syzkaller analysis report: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/repro.report Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/kconfig_origin Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/bisect_info.log Issue bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240812_194918_event_function_call/bzImage_dcdef334e8852ca433e908db67744965101f3fc1.tar.gz Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/dcdef334e8852ca433e908db67744965101f3fc1_dmesg.log " [ 17.071069] BUG: using smp_processor_id() in preemptible [00000000] code: repro/726 [ 17.071875] caller is debug_smp_processor_id+0x20/0x30 [ 17.072369] CPU: 0 UID: 0 PID: 726 Comm: repro Not tainted 6.11.0-rc3-next-20240812-dcdef334e885+ #1 [ 17.073225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 17.074205] Call Trace: [ 17.074436] <TASK> [ 17.074646] dump_stack_lvl+0x121/0x150 [ 17.075005] dump_stack+0x19/0x20 [ 17.075313] check_preemption_disabled+0x168/0x180 [ 17.075753] debug_smp_processor_id+0x20/0x30 [ 17.076156] event_function_call+0xd7/0x5c0 [ 17.076520] ? __pfx___perf_remove_from_context+0x10/0x10 [ 17.076986] ? __this_cpu_preempt_check+0x21/0x30 [ 17.077404] ? __pfx_event_function_call+0x10/0x10 [ 17.077850] ? __this_cpu_preempt_check+0x21/0x30 [ 17.078276] ? _raw_spin_unlock_irq+0x2c/0x60 [ 17.078681] ? lockdep_hardirqs_on+0x89/0x110 [ 17.079084] perf_remove_from_context+0xf7/0x1d0 [ 17.079506] perf_event_release_kernel+0x186/0x870 [ 17.079944] ? lock_is_held_type+0xef/0x150 [ 17.080326] ? locks_remove_file+0x3bb/0x5e0 [ 17.080727] ? __pfx_perf_event_release_kernel+0x10/0x10 [ 17.081202] ? __this_cpu_preempt_check+0x21/0x30 [ 17.081634] ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30 [ 17.082116] ? evm_file_release+0x193/0x1f0 [ 17.082502] ? __pfx_perf_release+0x10/0x10 [ 17.082880] perf_release+0x40/0x60 [ 17.083203] __fput+0x426/0xbc0 [ 17.083509] ____fput+0x1f/0x30 [ 17.083806] task_work_run+0x19c/0x2b0 [ 17.084159] ? __pfx_task_work_run+0x10/0x10 [ 17.084553] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 [ 17.085036] ? switch_task_namespaces+0xc6/0x110 [ 17.085463] do_exit+0xb19/0x2aa0 [ 17.085778] ? lock_release+0x441/0x870 [ 17.086138] ? __pfx_do_exit+0x10/0x10 [ 17.086485] ? __this_cpu_preempt_check+0x21/0x30 [ 17.087007] ? _raw_spin_unlock_irq+0x2c/0x60 [ 17.087219] ? lockdep_hardirqs_on+0x89/0x110 [ 17.087426] do_group_exit+0xe4/0x2c0 [ 17.087607] __x64_sys_exit_group+0x4d/0x60 [ 17.087799] x64_sys_call+0x20c4/0x20d0 [ 17.087974] do_syscall_64+0x6d/0x140 [ 17.088144] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 17.088374] RIP: 0033:0x7f6c5a518a4d [ 17.088541] Code: Unable to access opcode bytes at 0x7f6c5a518a23. [ 17.088832] RSP: 002b:00007fffca44fe68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 17.089166] RAX: ffffffffffffffda RBX: 00007f6c5a5f69e0 RCX: 00007f6c5a518a4d [ 17.089476] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000000 [ 17.089790] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020 [ 17.090100] R10: 00007fffca44fd10 R11: 0000000000000246 R12: 00007f6c5a5f69e0 [ 17.090411] R13: 00007f6c5a5fbf00 R14: 0000000000000001 R15: 00007f6c5a5fbee8 [ 17.090740] </TASK> [ 17.162307] BUG: using smp_processor_id() in preemptible [00000000] code: repro/727 [ 17.163284] caller is debug_smp_processor_id+0x20/0x30 [ 17.163765] CPU: 0 UID: 0 PID: 727 Comm: repro Not tainted 6.11.0-rc3-next-20240812-dcdef334e885+ #1 [ 17.164571] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 17.165606] Call Trace: [ 17.165840] <TASK> [ 17.166045] dump_stack_lvl+0x121/0x150 [ 17.166407] dump_stack+0x19/0x20 [ 17.166732] check_preemption_disabled+0x168/0x180 [ 17.166944] debug_smp_processor_id+0x20/0x30 [ 17.167153] event_function_call+0xd7/0x5c0 [ 17.167355] ? __pfx___perf_remove_from_context+0x10/0x10 [ 17.167611] ? __this_cpu_preempt_check+0x21/0x30 [ 17.167835] ? __pfx_event_function_call+0x10/0x10 [ 17.168072] ? __this_cpu_preempt_check+0x21/0x30 [ 17.168297] ? _raw_spin_unlock_irq+0x2c/0x60 [ 17.168505] ? lockdep_hardirqs_on+0x89/0x110 [ 17.168718] perf_remove_from_context+0xf7/0x1d0 [ 17.168939] perf_event_release_kernel+0x186/0x870 [ 17.169164] ? lock_is_held_type+0xef/0x150 [ 17.169362] ? locks_remove_file+0x3bb/0x5e0 [ 17.169546] ? __pfx_perf_event_release_kernel+0x10/0x10 [ 17.169791] ? __this_cpu_preempt_check+0x21/0x30 [ 17.169994] ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30 [ 17.170219] ? evm_file_release+0x193/0x1f0 [ 17.170421] ? __pfx_perf_release+0x10/0x10 [ 17.170620] perf_release+0x40/0x60 [ 17.170790] __fput+0x426/0xbc0 [ 17.170938] ____fput+0x1f/0x30 [ 17.171082] task_work_run+0x19c/0x2b0 [ 17.171271] ? __pfx_task_work_run+0x10/0x10 [ 17.171478] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 [ 17.171732] ? switch_task_namespaces+0xc6/0x110 [ 17.171957] do_exit+0xb19/0x2aa0 [ 17.172122] ? lock_release+0x441/0x870 [ 17.172314] ? __pfx_do_exit+0x10/0x10 [ 17.172476] ? __this_cpu_preempt_check+0x21/0x30 [ 17.172700] ? _raw_spin_unlock_irq+0x2c/0x60 [ 17.172910] ? lockdep_hardirqs_on+0x89/0x110 [ 17.173126] do_group_exit+0xe4/0x2c0 [ 17.173307] __x64_sys_exit_group+0x4d/0x60 [ 17.173509] x64_sys_call+0x20c4/0x20d0 [ 17.173679] do_syscall_64+0x6d/0x140 [ 17.173854] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 17.174091] RIP: 0033:0x7fc803b18a4d [ 17.174245] Code: Unable to access opcode bytes at 0x7fc803b18a23. [ 17.174522] RSP: 002b:00007ffd949207a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 17.174892] RAX: ffffffffffffffda RBX: 00007fc803bf69e0 RCX: 00007fc803b18a4d [ 17.175208] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000000 [ 17.175526] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020 [ 17.175849] R10: 00007ffd94920650 R11: 0000000000000246 R12: 00007fc803bf69e0 [ 17.176170] R13: 00007fc803bfbf00 R14: 0000000000000001 R15: 00007fc803bfbee8 [ 17.176508] </TASK> " I hope it's helpful. Thanks! --- If you don't need the following environment to reproduce the problem or if you already have one reproduced environment, please ignore the following information. How to reproduce: git clone https://gitlab.com/xupengfe/repro_vm_env.git cd repro_vm_env tar -xvf repro_vm_env.tar.gz cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0 // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel // You could change the bzImage_xxx as you want // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version You could use below command to log in, there is no password for root. ssh -p 10023 root@localhost After login vm(virtual machine) successfully, you could transfer reproduced binary to the vm by below way, and reproduce the problem in vm: gcc -pthread -o repro repro.c scp -P 10023 repro root@localhost:/root/ Get the bzImage for target kernel: Please use target kconfig and copy it to kernel_src/.config make olddefconfig make -jx bzImage //x should equal or less than cpu num your pc has Fill the bzImage file into above start3.sh to load the target kernel in vm. Tips: If you already have qemu-system-x86_64, please ignore below info. If you want to install qemu v7.1.0 version: git clone https://github.com/qemu/qemu.git cd qemu git checkout -f v7.1.0 mkdir build cd build yum install -y ninja-build.x86_64 yum -y install libslirp-devel.x86_64 ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp make make install Best Regards, Thanks! On 2024-08-08 at 10:32:08 -0000, tip-bot2 for Peter Zijlstra wrote: > The following commit has been merged into the perf/core branch of tip: > > Commit-ID: 558abc7e3f895049faa46b08656be4c60dc6e9fd > Gitweb: https://git.kernel.org/tip/558abc7e3f895049faa46b08656be4c60dc6e9fd > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > AuthorDate: Wed, 07 Aug 2024 13:29:27 +02:00 > Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > CommitterDate: Thu, 08 Aug 2024 12:27:31 +02:00 > > perf: Fix event_function_call() locking > > All the event_function/@func call context already uses perf_ctx_lock() > except for the !ctx->is_active case. Make it all consistent. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx> > Reviewed-by: Namhyung Kim <namhyung@xxxxxxxxxx> > Link: https://lore.kernel.org/r/20240807115550.138301094@xxxxxxxxxxxxx > --- > kernel/events/core.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index eb03c9a..ab49dea 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -263,6 +263,7 @@ unlock: > static void event_function_call(struct perf_event *event, event_f func, void *data) > { > struct perf_event_context *ctx = event->ctx; > + struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); > struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */ > struct event_function_struct efs = { > .event = event, > @@ -291,22 +292,22 @@ again: > if (!task_function_call(task, event_function, &efs)) > return; > > - raw_spin_lock_irq(&ctx->lock); > + perf_ctx_lock(cpuctx, ctx); > /* > * Reload the task pointer, it might have been changed by > * a concurrent perf_event_context_sched_out(). > */ > task = ctx->task; > if (task == TASK_TOMBSTONE) { > - raw_spin_unlock_irq(&ctx->lock); > + perf_ctx_unlock(cpuctx, ctx); > return; > } > if (ctx->is_active) { > - raw_spin_unlock_irq(&ctx->lock); > + perf_ctx_unlock(cpuctx, ctx); > goto again; > } > func(event, NULL, ctx, data); > - raw_spin_unlock_irq(&ctx->lock); > + perf_ctx_unlock(cpuctx, ctx); > } > > /*