On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote: > Hi Josh, Sumath, and Heiko, > > Recently, all patches submitted to bpf have been failing the selftest > "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon > investigation, it was determined that a specific patch, 778666df60f0 > ("s390: compile relocatable kernel without -fPIE"), is causing the problem. > It is the first bad after bisecting. Please take a look. > Thank you! > > You can reproduce it by running the command > > ./test_progs -a dummy_st_ops/dummy_init_ptr_arg > > in the tools/testing/selftests/bpf directory in the Linux source tree. > This command causes a kernel crash with the following messages. It > appears to be related to attaching a trace program at the fentry of a > struct_ops operator. The same operator works fine without the trace > program. For more details about the test case, please check > dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/. > > illegal operation: 0001 ilc:1 [#1] SMP > Modules linked in: > CPU: 0 PID: 82 Comm: test_progs Tainted: G OE > 6.8.0-rc3-g778666df60f0 #34 > Hardware name: QEMU 3906 QEMU (KVM/Linux) > Krnl PSW : 0704d00180000000 0000000000000002 (0x2) > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 > Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20 > 0000000082a08b58 > 0000000000000000 000000007083ede2 0000000000000000 > 0000000000000001 > 00000000009d580e 000000008286bc00 0000037f000000d8 > 000003ff800b2000 > 0000000081a43c00 00000000011bfe20 000003ff800b208e > 0000038000a73c68 > Krnl Code:#0000000000000000: 0000 illegal > >0000000000000002: 0000 illegal > 0000000000000004: 0000 illegal > 0000000000000006: 0000 illegal > 0000000000000008: 0000 illegal > 000000000000000a: 0000 illegal > 000000000000000c: 0000 illegal > 000000000000000e: 0000 illegal > Call Trace: > [<0000000000000002>] 0x2 > [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250 > [<000000000033145a>] __sys_bpf+0xa1a/0xd00 > [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50 > [<0000000000c4382c>] __do_syscall+0x244/0x300 > [<0000000000c59a40>] system_call+0x70/0x98 > INFO: lockdep is turned off. > Last Breaking-Event-Address: > [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70 > Kernel panic - not syncing: Fatal exception: panic_on_oops > > > [1] > https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571 Hi, thanks for the report, I'm currently looking into it. The problem is that the following: static void bpf_jit_plt(void *plt, void *ret, void *target) { memcpy(plt, bpf_plt, BPF_PLT_SIZE); *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret; *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret; } is (mis)compiled to the following: 0x000000000017a26e <+238>: stg %r4,232(%r3,%r15) 0x000000000017a274 <+244>: ltgr %r11,%r11 0x000000000017a278 <+248>: locgrne %r4,%r11 0x000000000017a27c <+252>: stg %r4,232(%r1,%r15) 0x000000000017a282 <+258>: la %r3,232(%r1,%r15) 0x000000000017a286 <+262>: lghi %r4,8 0x000000000017a28a <+266>: mvc 232(16,%r15),0(%r2) 0x000000000017a290 <+272>: mvc 248(16,%r15),16(%r2) Note that memcpy() and assignments are reordered, and this is causing the crash. The C code violates the strict aliasing rules, but this should be irrelevant for the kernel. The problem does not occur with the latest GCC. Bisect says that the following commit fixed it: commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc Author: Aldy Hernandez <aldyh@xxxxxxxxxx> Date: Tue Jun 15 12:32:51 2021 +0200 Backwards jump threader rewrite with ranger. but I think that's accidental. The reordering is done by the sched1 pass. I plan to find out why it thinks it's okay to reorder these memory accesses. Best regards, Ilya