On Tue, May 12, 2020 at 7:46 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > When running BPF tests I ran into some issues and couldn't get a clean > set of results on the bpf-next master branch. Just wanted to check if anyone > else is seeing any of these failures. > > 1. Timeouts. When running "make run_tests" in tools/testing/selftests/bpf, > the kselftest runner uses an over-aggressive default timeout of 45 seconds > for tests. For some tests which comprise a series of sub-tests, this > is a bit too short. For example, I regularly see: > > not ok 30 selftests: bpf: test_tunnel.sh # TIMEOUT > > not ok 37 selftests: bpf: test_lwt_ip_encap.sh # TIMEOUT > > not ok 39 selftests: bpf: test_tc_tunnel.sh # TIMEOUT > > not ok 41 selftests: bpf: test_xdping.sh # TIMEOUT > > Theses tests all share the characteristic that they consist of a set of > subtests, and while some sleeps could potentially be trimmed it seems > like we may want to override the default timeout with a "settings" file > to get more stable results. Picking magic numbers that work for everyone > is problematic of course. timeout=0 (disable timeouts) is one answer I > suppose. Are others hitting this, or are you adding your own settings > file with a timeout override, or perhaps invoking the tests in a way other > than "make run_tests" in tools/testing/selftests/bpf? > I just run each test binary individually... > 2. Missing CONFIG variables in tools/testing/selftests/bpf/config. As I > understand it the toplevel config file is supposed to specify config vars > needed to run the associated tests. I noticed a few absences: > > Should CONFIG_IPV6_SEG6_BPF be in tools/testing/selftests/bpf/config? > Without it the helper bpf_lwt_seg6_adjust_srh is not compiled in so > loading test_seg6_loop.o fails: > > # libbpf: load bpf program failed: Invalid argument > # libbpf: -- BEGIN DUMP LOG --- > # libbpf: > # unknown func bpf_lwt_seg6_adjust_srh#75 > # verification time 48 usec > # stack depth 88 > # processed 90 insns (limit 1000000) max_states_per_insn 0 total_states 6 > peak_states 6 mark_read 3 > # > # libbpf: -- END LOG -- > # libbpf: failed to load program 'lwt_seg6local' > # libbpf: failed to load object 'test_seg6_loop.o' > # test_bpf_verif_scale:FAIL:110 > # #5/21 test_seg6_loop.o:FAIL > # #5 bpf_verif_scale:FAIL > > Same question for CONFIG_LIRC for test_lirc* tests; I'm seeing: > > # grep: /sys/class/rc/rc0/lirc*/uevent: No such file or directory > # Usage: ./test_lirc_mode2_user /dev/lircN /dev/input/eventM > # ^[[0;31mFAIL: lirc_mode2^[[0m > > ...which I suspect would be fixed by having CONFIG_LIRC. > Yep, probably, please send a patch. > 3. libbpf: XXX is not found in vmlinux BTF > > A few different cases here across a bunch of tests: > [...] > # libbpf: hrtimer_nanosleep is not found in vmlinux BTF > > The strange thing is I'm running with the latest LLVM/clang > from llvm-project.git, installed libbpf/bpftool from the kernel > build, specified CONFIG_DEBUG_INFO_BTF etc and built BTF with pahole 1.16. > Here's an example failure for fentry_test: > > ./test_progs -vvv -t fentry_test [...] > libbpf: found data map 0 (fentry_t.bss, sec 16, off 0) for insn 16 > libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0 > libbpf: map 'fentry_t.bss': created successfully, fd=4 > libbpf: bpf_fentry_test1 is not found in vmlinux BTF > libbpf: failed to load object 'fentry_test' > libbpf: failed to load BPF skeleton 'fentry_test': -2 > test_fentry_test:FAIL:fentry_skel_load fentry skeleton failed > #19 fentry_test:FAIL > Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED > > What's odd is that symbols are being found when loading via > bpf_load_xattr(); the common thread in the above seems to be BPF > skeleton-based open+load. Is there anything else I should check > to further debug this? Did you check if /sys/kernel/btf/vmlinux really contains those functions? bpftool btf dump file /sys/kernel/btf/vmlinux | grep bpf_fentry_test1 It clearly loaded BTF successfully, so I suspect BTF doesn't have FUNCs? Which might mean that your pahole v1.16 is not the one used during kernel BTF generation? Can you try to validate that? > > 4. Some of the tests rely on /dev/tcp - support for it seems to only > be in newer bash; tests which spawn nc servers and wait on data > transfers via /dev/tcp hang as a result (timeouts don't seem to > kill things either). Would it be reasonable to have tests fall back to > using nc where possible if /dev/tcp is not present, or perhaps > fail early? no opinion on this, maybe folks dealing with networking more can suggest something. > > Apologies if I've missed any discussion of any of the above. Thanks! > > Alan