Re: bpf selftest execution issues

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 14 May 2020 16:13:21 -0700

On Tue, May 12, 2020 at 7:46 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> When running BPF tests I ran into some issues and couldn't get a clean
> set of results on the bpf-next master branch. Just wanted to check if anyone
> else is seeing any of these failures.
>
> 1. Timeouts. When running "make run_tests" in tools/testing/selftests/bpf,
> the kselftest runner uses an over-aggressive default timeout of 45 seconds
> for tests. For some tests which comprise a series of sub-tests, this
> is a bit too short. For example, I regularly see:
>
> not ok 30 selftests: bpf: test_tunnel.sh # TIMEOUT
>
> not ok 37 selftests: bpf: test_lwt_ip_encap.sh # TIMEOUT
>
> not ok 39 selftests: bpf: test_tc_tunnel.sh # TIMEOUT
>
> not ok 41 selftests: bpf: test_xdping.sh # TIMEOUT
>
> Theses tests all share the characteristic that they consist of a set of
> subtests, and while some sleeps could potentially be trimmed it seems
> like we may want to override the default timeout with a "settings" file
> to get more stable results. Picking magic numbers that work for everyone
> is problematic of course. timeout=0 (disable timeouts) is one answer I
> suppose.  Are others hitting this, or are you adding your own settings
> file with a timeout override, or perhaps invoking the tests in a way other
> than "make run_tests" in tools/testing/selftests/bpf?
>

I just run each test binary individually...

> 2. Missing CONFIG variables in tools/testing/selftests/bpf/config. As I
> understand it the toplevel config file is supposed to specify config vars
> needed to run the associated tests.  I noticed a few absences:
>
> Should CONFIG_IPV6_SEG6_BPF be in tools/testing/selftests/bpf/config?
> Without it the helper bpf_lwt_seg6_adjust_srh is not compiled in so
> loading test_seg6_loop.o fails:
>
> # libbpf: load bpf program failed: Invalid argument
> # libbpf: -- BEGIN DUMP LOG ---
> # libbpf:
> # unknown func bpf_lwt_seg6_adjust_srh#75
> # verification time 48 usec
> # stack depth 88
> # processed 90 insns (limit 1000000) max_states_per_insn 0 total_states 6
> peak_states 6 mark_read 3
> #
> # libbpf: -- END LOG --
> # libbpf: failed to load program 'lwt_seg6local'
> # libbpf: failed to load object 'test_seg6_loop.o'
> # test_bpf_verif_scale:FAIL:110
> # #5/21 test_seg6_loop.o:FAIL
> # #5 bpf_verif_scale:FAIL
>
> Same question for CONFIG_LIRC for test_lirc* tests; I'm seeing:
>
> # grep: /sys/class/rc/rc0/lirc*/uevent: No such file or directory
> # Usage: ./test_lirc_mode2_user /dev/lircN /dev/input/eventM
> # ^[[0;31mFAIL: lirc_mode2^[[0m
>
> ...which I suspect would be fixed by having CONFIG_LIRC.
>

Yep, probably, please send a patch.

> 3. libbpf: XXX is not found in vmlinux BTF
>
> A few different cases here across a bunch of tests:
>

[...]

> # libbpf: hrtimer_nanosleep is not found in vmlinux BTF
>
> The strange thing is I'm running with the latest LLVM/clang
> from llvm-project.git, installed libbpf/bpftool from the kernel
> build, specified CONFIG_DEBUG_INFO_BTF etc and built BTF with pahole 1.16.
> Here's an example failure for fentry_test:
>
> ./test_progs -vvv -t fentry_test

[...]

> libbpf: found data map 0 (fentry_t.bss, sec 16, off 0) for insn 16
> libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
> libbpf: map 'fentry_t.bss': created successfully, fd=4
> libbpf: bpf_fentry_test1 is not found in vmlinux BTF
> libbpf: failed to load object 'fentry_test'
> libbpf: failed to load BPF skeleton 'fentry_test': -2
> test_fentry_test:FAIL:fentry_skel_load fentry skeleton failed
> #19 fentry_test:FAIL
> Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
>
> What's odd is that symbols are being found when loading via
> bpf_load_xattr(); the common thread in the above seems to be BPF
> skeleton-based open+load. Is there anything else I should check
> to further debug this?

Did you check if /sys/kernel/btf/vmlinux really contains those functions?

bpftool btf dump file /sys/kernel/btf/vmlinux | grep bpf_fentry_test1

It clearly loaded BTF successfully, so I suspect BTF doesn't have
FUNCs? Which might mean that your pahole v1.16 is not the one used
during kernel BTF generation? Can you try to validate that?

>
> 4. Some of the tests rely on /dev/tcp - support for it seems to only
> be in  newer bash; tests which spawn nc servers and wait on data
> transfers via /dev/tcp hang as a result (timeouts don't seem to
> kill things either). Would it be reasonable to have tests fall back to
> using nc where possible if /dev/tcp is not present, or perhaps
> fail early?

no opinion on this, maybe folks dealing with networking more can
suggest something.

>
> Apologies if I've missed any discussion of any of the above. Thanks!
>
> Alan