Re: next-20190723: bpf/seccomp - systemd/journald issue?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 27, 2019 at 7:11 PM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 7/27/19 1:16 AM, Sedat Dilek wrote:
> > On Sat, Jul 27, 2019 at 9:36 AM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >>
> >> On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
> >> <alexei.starovoitov@xxxxxxxxx> wrote:
> >>>
> >>> On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >>>>
> >>>> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@xxxxxx> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 7/26/19 2:02 PM, Sedat Dilek wrote:
> >>>>>> On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> Hi Yonghong Song,
> >>>>>>>
> >>>>>>> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@xxxxxx> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> >>>>>>>>
> >>>>>>>> Glad to know clang 9 has asm goto support and now It can compile
> >>>>>>>> kernel again.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yupp.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> I am seeing a problem in the area bpf/seccomp causing
> >>>>>>>>> systemd/journald/udevd services to fail.
> >>>>>>>>>
> >>>>>>>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> >>>>>>>>> to connect stdout to the journal socket, ignoring: Connection refused
> >>>>>>>>>
> >>>>>>>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> >>>>>>>>> BFD linker ld.bfd on Debian/buster AMD64.
> >>>>>>>>> In both cases I use clang-9 (prerelease).
> >>>>>>>>
> >>>>>>>> Looks like it is a lld bug.
> >>>>>>>>
> >>>>>>>> I see the stack trace has __bpf_prog_run32() which is used by
> >>>>>>>> kernel bpf interpreter. Could you try to enable bpf jit
> >>>>>>>>      sysctl net.core.bpf_jit_enable = 1
> >>>>>>>> If this passed, it will prove it is interpreter related.
> >>>>>>>>
> >>>>>>>
> >>>>>>> After...
> >>>>>>>
> >>>>>>> sysctl -w net.core.bpf_jit_enable=1
> >>>>>>>
> >>>>>>> I can start all failed systemd services.
> >>>>>>>
> >>>>>>> systemd-journald.service
> >>>>>>> systemd-udevd.service
> >>>>>>> haveged.service
> >>>>>>>
> >>>>>>> This is in maintenance mode.
> >>>>>>>
> >>>>>>> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> >>>>>>>
> >>>>>>
> >>>>>> This is what I did:
> >>>>>
> >>>>> I probably won't have cycles to debug this potential lld issue.
> >>>>> Maybe you already did, I suggest you put enough reproducible
> >>>>> details in the bug you filed against lld so they can take a look.
> >>>>>
> >>>>
> >>>> I understand and will put the journalctl-log into the CBL issue
> >>>> tracker and update informations.
> >>>>
> >>>> Thanks for your help understanding the BPF correlations.
> >>>>
> >>>> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
> >>>
> >>> jit_enable=1 is enough.
> >>> Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
> >>>
> >>> It sounds like clang miscompiles interpreter.
> >
> > Just to clarify:
> > This does not happen with clang-9 + ld.bfd (GNU/ld linker).
> >
> >>> modprobe test_bpf
> >>> should be able to point out which part of interpreter is broken.
> >>
> >> Maybe we need something like...
> >>
> >> "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
> >>
> >> ...for clang?
> >>
> >
> > Not sure if something like GCC's...
> >
> > -fgcse
> >
> > Perform a global common subexpression elimination pass. This pass also
> > performs global constant and copy propagation.
> >
> > Note: When compiling a program using computed gotos, a GCC extension,
> > you may get better run-time performance if you disable the global
> > common subexpression elimination pass by adding -fno-gcse to the
> > command line.
> >
> > Enabled at levels -O2, -O3, -Os.
> >
> > ...is available for clang.
> >
> > I tried with hopping to turn off "global common subexpression elimination":
> >
> > diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
> > index 383c87300b0d..92f934a1e9ff 100644
> > --- a/arch/x86/net/Makefile
> > +++ b/arch/x86/net/Makefile
> > @@ -3,6 +3,8 @@
> >   # Arch-specific network modules
> >   #
> >
> > +KBUILD_CFLAGS += -O0
>
> This won't work. First, you added to the wrong file. The interpreter
> is at kernel/bpf/core.c.
>

Thanks for the clarification.
I mixed up the x86 BPF JIT compiler with the BPF interpreter.

I see no diff in the disassembled kernel/bpf/core.o in my clang9-bfd
and clang9-lld build-dirs.

l$ objdump -M intel -d linux.clang9-bfd/kernel/bpf/core.o >
bpf_core_o_clang9-bfd.txt
$ objdump -M intel -d linux.clang9-lld/kernel/bpf/core.o >
bpf_core_o_clang9-lld.txt

--- bpf_core_o_clang9-bfd.txt   2019-07-28 13:11:59.363552042 +0200
+++ bpf_core_o_clang9-lld.txt   2019-07-28 13:12:09.975535278 +0200
@@ -1,5 +1,5 @@

-linux.clang9-bfd/kernel/bpf/core.o:     file format elf64-x86-64
+linux.clang9-lld/kernel/bpf/core.o:     file format elf64-x86-64


 Disassembly of section .text:

> Second, kernel may have compilation issues with -O0.
>

Confirmed.

- Sedat -

> > +
> >   ifeq ($(CONFIG_X86_32),y)
> >           obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
> >   else
> >
> > Still see...
> > BROKEN: test_bpf: #294 BPF_MAXINSNS: Jump, gap, jump, ... jited:0
> >
> > - Sedat -
> >



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux