On 7/20/2023 4:57 PM, Jiri Olsa wrote: > hi, > we got report of kernel crash [1][3] within bpf_event_output helper. > > The reason is the nesting protection code in bpf_event_output that expects > disabled preemption, which is not guaranteed for programs executed by > bpf_prog_run_array_cg. > > I managed to reproduce on tracing side where we have the same problem > in bpf_perf_event_output. The reproducer [2] just creates busy uprobe > and call bpf_perf_event_output helper a lot. It is a little strange that I can not reproduce the same problem on my local VM by using the same reproducer from [2]. I have already enabled both CONFIG_PREEMPT=y and CONFIG_PREEMPT_RCU=y. > > v2 changes: > - I changed 'Fixes' commits to where I saw we switched from preempt_disable > to migrate_disable, but I'm not completely sure about the patch 2, because > it was tricky to find, would be nice if somebody could check on that After digging into the change log, I think the fix tag is correct. In commit 2a916f2f546c, preempt_disable() is changed to migrate_disable() in the macro BPF_PROG_RUN_ARRAY(), and in commit 772412176fb9, BPF_PROG_RUN_ARRAY_FLAGS() was added for __cgroup_bpf_run_filter_sock_addr() and the implementation of BPF_PROG_RUN_ARRAY_FLAGS() was basically copied from BPF_PROG_RUN_ARRAY(). And afterwards, BPF_PROG_RUN_ARRAY_FLAGS was rename to BPF_PROG_RUN_ARRAY_CG_FLAGS, then bpf_prog_run_array_cg_flags(), and was renamed to bpf_prog_run_array_cg() in the final. > > thanks, > jirka > > > [1] https://github.com/cilium/cilium/issues/26756 > [2] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf_output_fix_reproducer&id=8054dcc634121b884c7c331329d61d93351d03b5 > [3] slack: > [66194.378161] BUG: kernel NULL pointer dereference, address: 0000000000000001 > [66194.378324] #PF: supervisor instruction fetch in kernel mode > [66194.378447] #PF: error_code(0x0010) - not-present page > ... > [66194.378692] Oops: 0010 [#1] PREEMPT SMP NOPTI > ... > [66194.380666] <TASK> > [66194.380775] ? perf_output_sample+0x12a/0x9a0 > [66194.380902] ? finish_task_switch.isra.0+0x81/0x280 > [66194.381024] ? perf_event_output+0x66/0xa0 > [66194.381148] ? bpf_event_output+0x13a/0x190 > [66194.381270] ? bpf_event_output_data+0x22/0x40 > [66194.381391] ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb > [66194.381519] ? xa_load+0x87/0xe0 > [66194.381635] ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0 > [66194.381759] ? release_sock+0x3e/0x90 > [66194.381876] ? sk_setsockopt+0x1a1/0x12f0 > [66194.381996] ? udp_pre_connect+0x36/0x50 > [66194.382114] ? inet_dgram_connect+0x93/0xa0 > [66194.382233] ? __sys_connect+0xb4/0xe0 > [66194.382353] ? udp_setsockopt+0x27/0x40 > [66194.382470] ? __pfx_udp_push_pending_frames+0x10/0x10 > [66194.382593] ? __sys_setsockopt+0xdf/0x1a0 > [66194.382713] ? __x64_sys_connect+0xf/0x20 > [66194.382832] ? do_syscall_64+0x3a/0x90 > [66194.382949] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc > [66194.383077] </TASK> > > > --- > Jiri Olsa (2): > bpf: Disable preemption in bpf_perf_event_output > bpf: Disable preemption in bpf_event_output > > kernel/trace/bpf_trace.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > .