Re: Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote:
> Hi Josh, Sumath, and Heiko,
> 
> Recently, all patches submitted to bpf have been failing the selftest
> "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
> investigation, it was determined that a specific patch, 778666df60f0
> ("s390: compile relocatable kernel without -fPIE"), is causing the problem.
> It is the first bad after bisecting. Please take a look.
> Thank you!
> 
> You can reproduce it by running the command
> 
>     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg
> 
> in the tools/testing/selftests/bpf directory in the Linux source tree.
> This command causes a kernel crash with the following messages. It
> appears to be related to attaching a trace program at the fentry of a
> struct_ops operator. The same operator works fine without the trace
> program. For more details about the test case, please check
> dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.
> 
> illegal operation: 0001 ilc:1 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE
> 6.8.0-rc3-g778666df60f0 #34
> Hardware name: QEMU 3906 QEMU (KVM/Linux)
> Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
>            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20
> 0000000082a08b58
>            0000000000000000 000000007083ede2 0000000000000000
> 0000000000000001
>            00000000009d580e 000000008286bc00 0000037f000000d8
> 000003ff800b2000
>            0000000081a43c00 00000000011bfe20 000003ff800b208e
> 0000038000a73c68
> Krnl Code:#0000000000000000: 0000               illegal
>           >0000000000000002: 0000               illegal
>            0000000000000004: 0000               illegal
>            0000000000000006: 0000               illegal
>            0000000000000008: 0000               illegal
>            000000000000000a: 0000               illegal
>            000000000000000c: 0000               illegal
>            000000000000000e: 0000               illegal
> Call Trace:
>  [<0000000000000002>] 0x2
>  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
>  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
>  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
>  [<0000000000c4382c>] __do_syscall+0x244/0x300
>  [<0000000000c59a40>] system_call+0x70/0x98
> INFO: lockdep is turned off.
> Last Breaking-Event-Address:
>  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
> Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> 
> [1]
> https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571

Hi,

thanks for the report, I'm currently looking into it.

The problem is that the following:

static void bpf_jit_plt(void *plt, void *ret, void *target)
{
	memcpy(plt, bpf_plt, BPF_PLT_SIZE);
	*(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret;
	*(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret;
}

is (mis)compiled to the following:

   0x000000000017a26e <+238>:   stg     %r4,232(%r3,%r15)
   0x000000000017a274 <+244>:   ltgr    %r11,%r11
   0x000000000017a278 <+248>:   locgrne %r4,%r11
   0x000000000017a27c <+252>:   stg     %r4,232(%r1,%r15)
   0x000000000017a282 <+258>:   la      %r3,232(%r1,%r15)
   0x000000000017a286 <+262>:   lghi    %r4,8
   0x000000000017a28a <+266>:   mvc     232(16,%r15),0(%r2)
   0x000000000017a290 <+272>:   mvc     248(16,%r15),16(%r2)

Note that memcpy() and assignments are reordered, and this is causing
the crash. The C code violates the strict aliasing rules, but this
should be irrelevant for the kernel.

The problem does not occur with the latest GCC. Bisect says that the
following commit fixed it:

commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc                                                          
Author: Aldy Hernandez <aldyh@xxxxxxxxxx>                                                                
Date:   Tue Jun 15 12:32:51 2021 +0200                                                                   
                                                                                                         
    Backwards jump threader rewrite with ranger.

but I think that's accidental. The reordering is done by the sched1
pass. I plan to find out why it thinks it's okay to reorder these
memory accesses.

Best regards,
Ilya




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux