On 7/21/2023 8:45 PM, Jiri Olsa wrote: > On Fri, Jul 21, 2023 at 08:16:14PM +0800, Hou Tao wrote: >> >> On 7/20/2023 4:57 PM, Jiri Olsa wrote: >>> We received report [1] of kernel crash, which is caused by >>> using nesting protection without disabled preemption. >>> >>> The bpf_event_output can be called by programs executed by >>> bpf_prog_run_array_cg function that disabled migration but >>> keeps preemption enabled. >>> >>> This can cause task to be preempted by another one inside the >>> nesting protection and lead eventually to two tasks using same >>> perf_sample_data buffer and cause crashes like: >>> >>> BUG: kernel NULL pointer dereference, address: 0000000000000001 >>> #PF: supervisor instruction fetch in kernel mode >>> #PF: error_code(0x0010) - not-present page >>> ... >>> ? perf_output_sample+0x12a/0x9a0 >>> ? finish_task_switch.isra.0+0x81/0x280 >>> ? perf_event_output+0x66/0xa0 >>> ? bpf_event_output+0x13a/0x190 >>> ? bpf_event_output_data+0x22/0x40 >>> ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb >>> ? xa_load+0x87/0xe0 >>> ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0 >>> ? release_sock+0x3e/0x90 >>> ? sk_setsockopt+0x1a1/0x12f0 >>> ? udp_pre_connect+0x36/0x50 >>> ? inet_dgram_connect+0x93/0xa0 >>> ? __sys_connect+0xb4/0xe0 >>> ? udp_setsockopt+0x27/0x40 >>> ? __pfx_udp_push_pending_frames+0x10/0x10 >>> ? __sys_setsockopt+0xdf/0x1a0 >>> ? __x64_sys_connect+0xf/0x20 >>> ? do_syscall_64+0x3a/0x90 >>> ? entry_SYSCALL_64_after_hwframe+0x72/0xdc >>> >>> Fixing this by disabling preemption in bpf_event_output. >>> >>> [1] https://github.com/cilium/cilium/issues/26756 >>> Cc: stable@xxxxxxxxxxxxxxx >>> Reported-by: Oleg "livelace" Popov <o.popov@xxxxxxxxxxx> >>> Fixes: 2a916f2f546c bpf: Use migrate_disable/enable in array macros and cgroup/lirc code. >>> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> >> Acked-by: Hou Tao <houtao1@xxxxxxxxxx> >> >> With one nit above. The format of the Fixes tags should be 2a916f2f546c >> ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.") >> > right, sorry about that.. should I resend? We can wait for the comments from Alexei. Maybe the maintainer can fix it for you. > > thanks, > jirka