Re: Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 18, 2024 at 1:38 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote:
>
> On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote:
> > Hi Josh, Sumath, and Heiko,
> >
> > Recently, all patches submitted to bpf have been failing the selftest
> > "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
> > investigation, it was determined that a specific patch, 778666df60f0
> > ("s390: compile relocatable kernel without -fPIE"), is causing the problem.
> > It is the first bad after bisecting. Please take a look.
> > Thank you!
> >
> > You can reproduce it by running the command
> >
> >     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg
> >
> > in the tools/testing/selftests/bpf directory in the Linux source tree.
> > This command causes a kernel crash with the following messages. It
> > appears to be related to attaching a trace program at the fentry of a
> > struct_ops operator. The same operator works fine without the trace
> > program. For more details about the test case, please check
> > dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.
> >
> > illegal operation: 0001 ilc:1 [#1] SMP
> > Modules linked in:
> > CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE
> > 6.8.0-rc3-g778666df60f0 #34
> > Hardware name: QEMU 3906 QEMU (KVM/Linux)
> > Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
> >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> > Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20
> > 0000000082a08b58
> >            0000000000000000 000000007083ede2 0000000000000000
> > 0000000000000001
> >            00000000009d580e 000000008286bc00 0000037f000000d8
> > 000003ff800b2000
> >            0000000081a43c00 00000000011bfe20 000003ff800b208e
> > 0000038000a73c68
> > Krnl Code:#0000000000000000: 0000               illegal
> >           >0000000000000002: 0000               illegal
> >            0000000000000004: 0000               illegal
> >            0000000000000006: 0000               illegal
> >            0000000000000008: 0000               illegal
> >            000000000000000a: 0000               illegal
> >            000000000000000c: 0000               illegal
> >            000000000000000e: 0000               illegal
> > Call Trace:
> >  [<0000000000000002>] 0x2
> >  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
> >  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
> >  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
> >  [<0000000000c4382c>] __do_syscall+0x244/0x300
> >  [<0000000000c59a40>] system_call+0x70/0x98
> > INFO: lockdep is turned off.
> > Last Breaking-Event-Address:
> >  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
> > Kernel panic - not syncing: Fatal exception: panic_on_oops
> >
> >
> > [1]
> > https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571
>
> Hi,
>
> thanks for the report, I'm currently looking into it.
>
> The problem is that the following:
>
> static void bpf_jit_plt(void *plt, void *ret, void *target)
> {
>         memcpy(plt, bpf_plt, BPF_PLT_SIZE);
>         *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret;
>         *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret;
> }
>
> is (mis)compiled to the following:
>
>    0x000000000017a26e <+238>:   stg     %r4,232(%r3,%r15)
>    0x000000000017a274 <+244>:   ltgr    %r11,%r11
>    0x000000000017a278 <+248>:   locgrne %r4,%r11
>    0x000000000017a27c <+252>:   stg     %r4,232(%r1,%r15)
>    0x000000000017a282 <+258>:   la      %r3,232(%r1,%r15)
>    0x000000000017a286 <+262>:   lghi    %r4,8
>    0x000000000017a28a <+266>:   mvc     232(16,%r15),0(%r2)
>    0x000000000017a290 <+272>:   mvc     248(16,%r15),16(%r2)
>
> Note that memcpy() and assignments are reordered, and this is causing
> the crash. The C code violates the strict aliasing rules, but this
> should be irrelevant for the kernel.
>
> The problem does not occur with the latest GCC. Bisect says that the
> following commit fixed it:
>
> commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc
> Author: Aldy Hernandez <aldyh@xxxxxxxxxx>
> Date:   Tue Jun 15 12:32:51 2021 +0200
>
>     Backwards jump threader rewrite with ranger.
>
> but I think that's accidental. The reordering is done by the sched1
> pass. I plan to find out why it thinks it's okay to reorder these
> memory accesses.

Wow. What a discovery.
Does this bug affect other architectures ?
Probably yes, since it's sched1 pass ?
Should we apply some workaround for now?

Also it's probably not strict aliasing related:
extern const char bpf_plt[];
extern const char bpf_plt_ret[];
extern const char bpf_plt_target[];
and bpf_jit_plt() casts the pointer to 'char *'.
The 'char *' was always special from GCC strict aliasing point of view.
iirc gcc used to treat such pointers as pointing anywhere.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux