On 7/20/2023 4:57 PM, Jiri Olsa wrote: > We received report [1] of kernel crash, which is caused by > using nesting protection without disabled preemption. > > The bpf_event_output can be called by programs executed by > bpf_prog_run_array_cg function that disabled migration but > keeps preemption enabled. > > This can cause task to be preempted by another one inside the > nesting protection and lead eventually to two tasks using same > perf_sample_data buffer and cause crashes like: > > BUG: kernel NULL pointer dereference, address: 0000000000000001 > #PF: supervisor instruction fetch in kernel mode > #PF: error_code(0x0010) - not-present page > ... > ? perf_output_sample+0x12a/0x9a0 > ? finish_task_switch.isra.0+0x81/0x280 > ? perf_event_output+0x66/0xa0 > ? bpf_event_output+0x13a/0x190 > ? bpf_event_output_data+0x22/0x40 > ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb > ? xa_load+0x87/0xe0 > ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0 > ? release_sock+0x3e/0x90 > ? sk_setsockopt+0x1a1/0x12f0 > ? udp_pre_connect+0x36/0x50 > ? inet_dgram_connect+0x93/0xa0 > ? __sys_connect+0xb4/0xe0 > ? udp_setsockopt+0x27/0x40 > ? __pfx_udp_push_pending_frames+0x10/0x10 > ? __sys_setsockopt+0xdf/0x1a0 > ? __x64_sys_connect+0xf/0x20 > ? do_syscall_64+0x3a/0x90 > ? entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Fixing this by disabling preemption in bpf_event_output. > > [1] https://github.com/cilium/cilium/issues/26756 > Cc: stable@xxxxxxxxxxxxxxx > Reported-by: Oleg "livelace" Popov <o.popov@xxxxxxxxxxx> > Fixes: 2a916f2f546c bpf: Use migrate_disable/enable in array macros and cgroup/lirc code. > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> Acked-by: Hou Tao <houtao1@xxxxxxxxxx> With one nit above. The format of the Fixes tags should be 2a916f2f546c ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")