Re: Crash caused by 778666df60f0 ("s390: compile relocatable kernel without -fPIE")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 18, 2024 at 04:31:00PM -0700, Alexei Starovoitov wrote:
> On Mon, Mar 18, 2024 at 1:38 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Mar 15, 2024 at 09:59:52AM -0700, Kui-Feng Lee wrote:
> > > Hi Josh, Sumath, and Heiko,
> > >
> > > Recently, all patches submitted to bpf have been failing the selftest
> > > "dummy_st_ops/dummy_init_ptr_arg" on the s390x platform. Upon
> > > investigation, it was determined that a specific patch, 778666df60f0
> > > ("s390: compile relocatable kernel without -fPIE"), is causing the problem.
> > > It is the first bad after bisecting. Please take a look.
> > > Thank you!
> > >
> > > You can reproduce it by running the command
> > >
> > >     ./test_progs -a dummy_st_ops/dummy_init_ptr_arg
> > >
> > > in the tools/testing/selftests/bpf directory in the Linux source tree.
> > > This command causes a kernel crash with the following messages. It
> > > appears to be related to attaching a trace program at the fentry of a
> > > struct_ops operator. The same operator works fine without the trace
> > > program. For more details about the test case, please check
> > > dummy_st_ops.c in tools/testing/selftests/bpf/prog_tests/.
> > >
> > > illegal operation: 0001 ilc:1 [#1] SMP
> > > Modules linked in:
> > > CPU: 0 PID: 82 Comm: test_progs Tainted: G           OE
> > > 6.8.0-rc3-g778666df60f0 #34
> > > Hardware name: QEMU 3906 QEMU (KVM/Linux)
> > > Krnl PSW : 0704d00180000000 0000000000000002 (0x2)
> > >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> > > Krnl GPRS: 000003ff800a4956 0000000000000000 0000038000a73d20
> > > 0000000082a08b58
> > >            0000000000000000 000000007083ede2 0000000000000000
> > > 0000000000000001
> > >            00000000009d580e 000000008286bc00 0000037f000000d8
> > > 000003ff800b2000
> > >            0000000081a43c00 00000000011bfe20 000003ff800b208e
> > > 0000038000a73c68
> > > Krnl Code:#0000000000000000: 0000               illegal
> > >           >0000000000000002: 0000               illegal
> > >            0000000000000004: 0000               illegal
> > >            0000000000000006: 0000               illegal
> > >            0000000000000008: 0000               illegal
> > >            000000000000000a: 0000               illegal
> > >            000000000000000c: 0000               illegal
> > >            000000000000000e: 0000               illegal
> > > Call Trace:
> > >  [<0000000000000002>] 0x2
> > >  [<00000000009d5cde>] bpf_struct_ops_test_run+0x156/0x250
> > >  [<000000000033145a>] __sys_bpf+0xa1a/0xd00
> > >  [<00000000003319dc>] __s390x_sys_bpf+0x44/0x50
> > >  [<0000000000c4382c>] __do_syscall+0x244/0x300
> > >  [<0000000000c59a40>] system_call+0x70/0x98
> > > INFO: lockdep is turned off.
> > > Last Breaking-Event-Address:
> > >  [<000003ff800a49ac>] bpf_prog_83ebf4e90b071f2b_test_1+0x5c/0x70
> > > Kernel panic - not syncing: Fatal exception: panic_on_oops
> > >
> > >
> > > [1]
> > > https://github.com/kernel-patches/bpf/actions/runs/8297891630/job/22710267571
> >
> > Hi,
> >
> > thanks for the report, I'm currently looking into it.
> >
> > The problem is that the following:
> >
> > static void bpf_jit_plt(void *plt, void *ret, void *target)
> > {
> >         memcpy(plt, bpf_plt, BPF_PLT_SIZE);
> >         *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret;
> >         *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret;
> > }
> >
> > is (mis)compiled to the following:
> >
> >    0x000000000017a26e <+238>:   stg     %r4,232(%r3,%r15)
> >    0x000000000017a274 <+244>:   ltgr    %r11,%r11
> >    0x000000000017a278 <+248>:   locgrne %r4,%r11
> >    0x000000000017a27c <+252>:   stg     %r4,232(%r1,%r15)
> >    0x000000000017a282 <+258>:   la      %r3,232(%r1,%r15)
> >    0x000000000017a286 <+262>:   lghi    %r4,8
> >    0x000000000017a28a <+266>:   mvc     232(16,%r15),0(%r2)
> >    0x000000000017a290 <+272>:   mvc     248(16,%r15),16(%r2)
> >
> > Note that memcpy() and assignments are reordered, and this is causing
> > the crash. The C code violates the strict aliasing rules, but this
> > should be irrelevant for the kernel.
> >
> > The problem does not occur with the latest GCC. Bisect says that the
> > following commit fixed it:
> >
> > commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc
> > Author: Aldy Hernandez <aldyh@xxxxxxxxxx>
> > Date:   Tue Jun 15 12:32:51 2021 +0200
> >
> >     Backwards jump threader rewrite with ranger.
> >
> > but I think that's accidental. The reordering is done by the sched1
> > pass. I plan to find out why it thinks it's okay to reorder these
> > memory accesses.
> 
> Wow. What a discovery.
> Does this bug affect other architectures ?
> Probably yes, since it's sched1 pass ?
> Should we apply some workaround for now?

I think it's because bpf_jit_plt() violates the C pointer subtraction
rules (bpf_plt_ret and bpf_plt are different objects from the C
perspective), so it should not affect the other architectures, unless
they use the same pattern elsewhere.

So I posted a patch fixing this in bpf_jit_plt().




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux