On Fri, Oct 8, 2021 at 3:26 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Wed, Oct 6, 2021 at 11:56 AM Yucong Sun <fallentree@xxxxxx> wrote: > > > > This patch series adds "-j" parelell execution to test_progs, with "--debug" to > > display server/worker communications. Also, some Tests that often fails in > > parallel are marked as serial test, and it will run in sequence after parallel > > execution is done. > > > > This patch series also adds a error summary after all tests execution finished. > > > > Huge milestone, good job! Applied most patches to bpf-next. See > comments below in respective patches. > > We'll need to iterate on improving the stability of parallel mode, but > this is a great start. I've dropped a bunch of "fix up" patches where > I didn't feel confident yet about the approach. We should discuss it > independently from the parallelization changes in this patch set. See > some more thoughts below, but overall: > > time sudo ./test_progs -j > ... > Summary: 181/977 PASSED, 3 SKIPPED, 0 FAILED > > real 0m36.949s > user 0m4.546s > sys 0m30.872s > > VS > > $ time sudo ./test_progs > ... > Summary: 181/977 PASSED, 3 SKIPPED, 0 FAILED > > real 1m3.031s > user 0m4.157s > sys 0m28.820s > > 2x speed up and the gap will just grow over time as we add more tests. > And that's also with bpf_verif_scale as is, which we should break up > into individual tests to parallelize them. > > So few things worth mentioning: > > 1. To focus future efforts on parallelizing existing tests, we should > probably emit how long did the test take. > > 2. We are losing subtest progress when running in parallel mode. That > sucks. While it's not easy to parallelize subtests, it's easy to send > separate logs for each subtest and display them as they come. Let's do > that? > > 3. Parallel execution times are not consistent, once I got 30 seconds > (which is 8 seconds faster than sequential, I excluded > bpf_verif_scale), other times it was 45 seconds and more than 1 > minute. Not sure what's going on there, but this doesn't look right. > > 4. A bunch of tests still fail from time to time (see examples below). > What's even scarier that once I got the "failed to determine > tracepoint perf event ID" message, subsequent sequential executions > kept failing. I don't see what selftest could have done to cause this, > so this is concerning and seems to point to the kernel. > /sys/kernel/debug and /sys/kernel/tracing directories were empty at > this point. cc Steven, is there any situation when tracefs can become > "defunct"? > Forgot to actually cc Steven, oops. Steven, I've run into the problem when running a few selftests that do uprobe/kprobe attachment. At some point, they started complaining that files like /sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id don't exist. And this condition persisted. When I checked /sys/kernel/debug/tracing in QEMU, it was empty. Is this a known problem? > #84 ns_current_pid_tgid:FAIL > test_current_pid_tgid:PASS:skel_open_load 0 nsec > test_current_pid_tgid:PASS:stat 0 nsec > libbpf: failed to determine tracepoint 'syscalls/sys_enter_nanosleep' > perf event ID: No such file or directory > libbpf: prog 'handler': failed to create tracepoint > 'syscalls/sys_enter_nanosleep' perf event: No such file or directory > libbpf: failed to auto-attach program 'handler': -2 > test_current_pid_tgid:FAIL:skel_attach skeleton attach failed: -2 > #84/1 ns_current_pid_tgid/ns_current_pid_tgid_root_ns:FAIL > test_ns_current_pid_tgid_new_ns:PASS:clone 0 nsec > test_ns_current_pid_tgid_new_ns:PASS:waitpid 0 nsec > test_ns_current_pid_tgid_new_ns:FAIL:newns_pidtgid failed#84/2 > ns_current_pid_tgid/ns_current_pid_tgid_new_ns:FAIL > > #88 perf_buffer:FAIL > serial_test_perf_buffer:PASS:nr_cpus 0 nsec > serial_test_perf_buffer:PASS:nr_on_cpus 0 nsec > serial_test_perf_buffer:PASS:skel_load 0 nsec > libbpf: failed to determine tracepoint 'raw_syscalls/sys_enter' perf > event ID: No such file or directory > libbpf: prog 'handle_sys_enter': failed to create tracepoint > 'raw_syscalls/sys_enter' perf event: No such file or directory > libbpf: failed to auto-attach program 'handle_sys_enter': -2 > serial_test_perf_buffer:FAIL:attach_kprobe err -2 > > #110 send_signal_sched_switch:FAIL > serial_test_send_signal_sched_switch:PASS:skel_open_and_load 0 nsec > libbpf: failed to determine tracepoint 'syscalls/sys_enter_nanosleep' > perf event ID: No such file or directory > libbpf: prog 'send_signal_tp': failed to create tracepoint > 'syscalls/sys_enter_nanosleep' perf event: No such file or directory > libbpf: failed to auto-attach program 'send_signal_tp': -2 > serial_test_send_signal_sched_switch:FAIL:skel_attach skeleton attach failed > > #161 tp_attach_query:FAIL > serial_test_tp_attach_query:FAIL:open err -1 errno 2 > > #163 trace_printk:FAIL > serial_test_trace_printk:PASS:trace_printk__open 0 nsec > serial_test_trace_printk:PASS:skel->rodata->fmt[0] 0 nsec > serial_test_trace_printk:PASS:trace_printk__load 0 nsec > serial_test_trace_printk:PASS:trace_printk__attach 0 nsec > serial_test_trace_printk:FAIL:fopen(TRACEBUF) unexpected error: -2 > > #164 trace_vprintk:FAIL > serial_test_trace_vprintk:PASS:trace_vprintk__open_and_load 0 nsec > serial_test_trace_vprintk:PASS:trace_vprintk__attach 0 nsec > serial_test_trace_vprintk:FAIL:fopen(TRACEBUF) unexpected error: -2 > > #46 fexit_stress:FAIL > test_fexit_stress:PASS:find_vmlinux_btf_id 0 nsec > test_fexit_stress:PASS:fexit loaded 0 nsec > test_fexit_stress:PASS:fexit attach failed 0 nsec > test_fexit_stress:PASS:fexit loaded 0 nsec > > ... > > test_fexit_stress:PASS:fexit loaded 0 nsec > test_fexit_stress:PASS:fexit attach failed 0 nsec > test_fexit_stress:PASS:fexit loaded 0 nsec > test_fexit_stress:FAIL:fexit attach failed prog 37 failed: -7 err 7 > > > > > V6 -> V5: > > * adding error summary logic for non parallel mode too. > > * changed how serial tests are implemented, use main process instead of worker 0. > > * fixed a dozen broken test when running in parallel. > > > > V5 -> V4: > > * change to SOCK_SEQPACKET for close notification. > > * move all debug output to "--debug" mode > > * output log as test finish, and all error logs again after summary line. > > * variable naming / style changes > > * adds serial_test_name() to replace serial test lists. > > > > > > Yucong Sun (14): > > selftests/bpf: Add parallelism to test_progs > > selftests/bpf: Allow some tests to be executed in sequence > > selftests/bpf: disable perf rate limiting when running tests. > > selftests/bpf: add per worker cgroup suffix > > selftests/bpf: adding read_perf_max_sample_freq() helper > > selftests/bpf: fix race condition in enable_stats > > selftests/bpf: make cgroup_v1v2 use its own port > > selftests/bpf: adding a namespace reset for tc_redirect > > selftests/bpf: Make uprobe tests use different attach functions. > > selftests/bpf: adding pid filtering for atomics test > > selftests/bpf: adding random delay for send_signal test > > selftests/bpf: Fix pid check in fexit_sleep test > > selftests/bpf: increase loop count for perf_branches > > selfetest/bpf: make some tests serial > > > > tools/testing/selftests/bpf/cgroup_helpers.c | 6 +- > > tools/testing/selftests/bpf/cgroup_helpers.h | 2 +- > > .../selftests/bpf/prog_tests/atomics.c | 1 + > > .../selftests/bpf/prog_tests/attach_probe.c | 8 +- > > .../selftests/bpf/prog_tests/bpf_cookie.c | 10 +- > > .../bpf/prog_tests/bpf_iter_setsockopt.c | 2 +- > > .../selftests/bpf/prog_tests/bpf_obj_id.c | 2 +- > > .../bpf/prog_tests/cg_storage_multi.c | 2 +- > > .../bpf/prog_tests/cgroup_attach_autodetach.c | 2 +- > > .../bpf/prog_tests/cgroup_attach_multi.c | 2 +- > > .../bpf/prog_tests/cgroup_attach_override.c | 2 +- > > .../selftests/bpf/prog_tests/cgroup_link.c | 2 +- > > .../selftests/bpf/prog_tests/cgroup_v1v2.c | 2 +- > > .../selftests/bpf/prog_tests/check_mtu.c | 2 +- > > .../selftests/bpf/prog_tests/fexit_bpf2bpf.c | 3 +- > > .../prog_tests/flow_dissector_load_bytes.c | 2 +- > > .../bpf/prog_tests/flow_dissector_reattach.c | 2 +- > > .../bpf/prog_tests/get_branch_snapshot.c | 2 +- > > .../selftests/bpf/prog_tests/kfree_skb.c | 3 +- > > .../bpf/prog_tests/migrate_reuseport.c | 2 +- > > .../selftests/bpf/prog_tests/modify_return.c | 3 +- > > .../bpf/prog_tests/ns_current_pid_tgid.c | 3 +- > > .../selftests/bpf/prog_tests/perf_branches.c | 10 +- > > .../selftests/bpf/prog_tests/perf_buffer.c | 2 +- > > .../selftests/bpf/prog_tests/perf_link.c | 5 +- > > .../selftests/bpf/prog_tests/probe_user.c | 3 +- > > .../bpf/prog_tests/raw_tp_writable_test_run.c | 3 +- > > .../bpf/prog_tests/select_reuseport.c | 2 +- > > .../selftests/bpf/prog_tests/send_signal.c | 6 +- > > .../bpf/prog_tests/send_signal_sched_switch.c | 3 +- > > .../bpf/prog_tests/sk_storage_tracing.c | 2 +- > > .../selftests/bpf/prog_tests/snprintf_btf.c | 2 +- > > .../selftests/bpf/prog_tests/sock_fields.c | 2 +- > > .../selftests/bpf/prog_tests/sockmap_listen.c | 2 +- > > .../bpf/prog_tests/stacktrace_build_id_nmi.c | 19 +- > > .../selftests/bpf/prog_tests/task_pt_regs.c | 8 +- > > .../selftests/bpf/prog_tests/tc_redirect.c | 14 + > > .../testing/selftests/bpf/prog_tests/timer.c | 3 +- > > .../selftests/bpf/prog_tests/timer_mim.c | 2 +- > > .../bpf/prog_tests/tp_attach_query.c | 2 +- > > .../selftests/bpf/prog_tests/trace_printk.c | 2 +- > > .../selftests/bpf/prog_tests/trace_vprintk.c | 2 +- > > .../bpf/prog_tests/trampoline_count.c | 3 +- > > .../selftests/bpf/prog_tests/xdp_attach.c | 2 +- > > .../selftests/bpf/prog_tests/xdp_bonding.c | 2 +- > > .../bpf/prog_tests/xdp_cpumap_attach.c | 2 +- > > .../bpf/prog_tests/xdp_devmap_attach.c | 2 +- > > .../selftests/bpf/prog_tests/xdp_info.c | 2 +- > > .../selftests/bpf/prog_tests/xdp_link.c | 2 +- > > tools/testing/selftests/bpf/progs/atomics.c | 16 + > > .../selftests/bpf/progs/connect4_dropper.c | 2 +- > > .../testing/selftests/bpf/progs/fexit_sleep.c | 4 +- > > .../selftests/bpf/progs/test_enable_stats.c | 2 +- > > tools/testing/selftests/bpf/test_progs.c | 671 +++++++++++++++++- > > tools/testing/selftests/bpf/test_progs.h | 37 +- > > 55 files changed, 790 insertions(+), 116 deletions(-) > > > > -- > > 2.30.2 > >