On Wed, Oct 6, 2021 at 11:56 AM Yucong Sun <fallentree@xxxxxx> wrote: > > This patch series adds "-j" parelell execution to test_progs, with "--debug" to > display server/worker communications. Also, some Tests that often fails in > parallel are marked as serial test, and it will run in sequence after parallel > execution is done. > > This patch series also adds a error summary after all tests execution finished. > Huge milestone, good job! Applied most patches to bpf-next. See comments below in respective patches. We'll need to iterate on improving the stability of parallel mode, but this is a great start. I've dropped a bunch of "fix up" patches where I didn't feel confident yet about the approach. We should discuss it independently from the parallelization changes in this patch set. See some more thoughts below, but overall: time sudo ./test_progs -j ... Summary: 181/977 PASSED, 3 SKIPPED, 0 FAILED real 0m36.949s user 0m4.546s sys 0m30.872s VS $ time sudo ./test_progs ... Summary: 181/977 PASSED, 3 SKIPPED, 0 FAILED real 1m3.031s user 0m4.157s sys 0m28.820s 2x speed up and the gap will just grow over time as we add more tests. And that's also with bpf_verif_scale as is, which we should break up into individual tests to parallelize them. So few things worth mentioning: 1. To focus future efforts on parallelizing existing tests, we should probably emit how long did the test take. 2. We are losing subtest progress when running in parallel mode. That sucks. While it's not easy to parallelize subtests, it's easy to send separate logs for each subtest and display them as they come. Let's do that? 3. Parallel execution times are not consistent, once I got 30 seconds (which is 8 seconds faster than sequential, I excluded bpf_verif_scale), other times it was 45 seconds and more than 1 minute. Not sure what's going on there, but this doesn't look right. 4. A bunch of tests still fail from time to time (see examples below). What's even scarier that once I got the "failed to determine tracepoint perf event ID" message, subsequent sequential executions kept failing. I don't see what selftest could have done to cause this, so this is concerning and seems to point to the kernel. /sys/kernel/debug and /sys/kernel/tracing directories were empty at this point. cc Steven, is there any situation when tracefs can become "defunct"? #84 ns_current_pid_tgid:FAIL test_current_pid_tgid:PASS:skel_open_load 0 nsec test_current_pid_tgid:PASS:stat 0 nsec libbpf: failed to determine tracepoint 'syscalls/sys_enter_nanosleep' perf event ID: No such file or directory libbpf: prog 'handler': failed to create tracepoint 'syscalls/sys_enter_nanosleep' perf event: No such file or directory libbpf: failed to auto-attach program 'handler': -2 test_current_pid_tgid:FAIL:skel_attach skeleton attach failed: -2 #84/1 ns_current_pid_tgid/ns_current_pid_tgid_root_ns:FAIL test_ns_current_pid_tgid_new_ns:PASS:clone 0 nsec test_ns_current_pid_tgid_new_ns:PASS:waitpid 0 nsec test_ns_current_pid_tgid_new_ns:FAIL:newns_pidtgid failed#84/2 ns_current_pid_tgid/ns_current_pid_tgid_new_ns:FAIL #88 perf_buffer:FAIL serial_test_perf_buffer:PASS:nr_cpus 0 nsec serial_test_perf_buffer:PASS:nr_on_cpus 0 nsec serial_test_perf_buffer:PASS:skel_load 0 nsec libbpf: failed to determine tracepoint 'raw_syscalls/sys_enter' perf event ID: No such file or directory libbpf: prog 'handle_sys_enter': failed to create tracepoint 'raw_syscalls/sys_enter' perf event: No such file or directory libbpf: failed to auto-attach program 'handle_sys_enter': -2 serial_test_perf_buffer:FAIL:attach_kprobe err -2 #110 send_signal_sched_switch:FAIL serial_test_send_signal_sched_switch:PASS:skel_open_and_load 0 nsec libbpf: failed to determine tracepoint 'syscalls/sys_enter_nanosleep' perf event ID: No such file or directory libbpf: prog 'send_signal_tp': failed to create tracepoint 'syscalls/sys_enter_nanosleep' perf event: No such file or directory libbpf: failed to auto-attach program 'send_signal_tp': -2 serial_test_send_signal_sched_switch:FAIL:skel_attach skeleton attach failed #161 tp_attach_query:FAIL serial_test_tp_attach_query:FAIL:open err -1 errno 2 #163 trace_printk:FAIL serial_test_trace_printk:PASS:trace_printk__open 0 nsec serial_test_trace_printk:PASS:skel->rodata->fmt[0] 0 nsec serial_test_trace_printk:PASS:trace_printk__load 0 nsec serial_test_trace_printk:PASS:trace_printk__attach 0 nsec serial_test_trace_printk:FAIL:fopen(TRACEBUF) unexpected error: -2 #164 trace_vprintk:FAIL serial_test_trace_vprintk:PASS:trace_vprintk__open_and_load 0 nsec serial_test_trace_vprintk:PASS:trace_vprintk__attach 0 nsec serial_test_trace_vprintk:FAIL:fopen(TRACEBUF) unexpected error: -2 #46 fexit_stress:FAIL test_fexit_stress:PASS:find_vmlinux_btf_id 0 nsec test_fexit_stress:PASS:fexit loaded 0 nsec test_fexit_stress:PASS:fexit attach failed 0 nsec test_fexit_stress:PASS:fexit loaded 0 nsec ... test_fexit_stress:PASS:fexit loaded 0 nsec test_fexit_stress:PASS:fexit attach failed 0 nsec test_fexit_stress:PASS:fexit loaded 0 nsec test_fexit_stress:FAIL:fexit attach failed prog 37 failed: -7 err 7 > V6 -> V5: > * adding error summary logic for non parallel mode too. > * changed how serial tests are implemented, use main process instead of worker 0. > * fixed a dozen broken test when running in parallel. > > V5 -> V4: > * change to SOCK_SEQPACKET for close notification. > * move all debug output to "--debug" mode > * output log as test finish, and all error logs again after summary line. > * variable naming / style changes > * adds serial_test_name() to replace serial test lists. > > > Yucong Sun (14): > selftests/bpf: Add parallelism to test_progs > selftests/bpf: Allow some tests to be executed in sequence > selftests/bpf: disable perf rate limiting when running tests. > selftests/bpf: add per worker cgroup suffix > selftests/bpf: adding read_perf_max_sample_freq() helper > selftests/bpf: fix race condition in enable_stats > selftests/bpf: make cgroup_v1v2 use its own port > selftests/bpf: adding a namespace reset for tc_redirect > selftests/bpf: Make uprobe tests use different attach functions. > selftests/bpf: adding pid filtering for atomics test > selftests/bpf: adding random delay for send_signal test > selftests/bpf: Fix pid check in fexit_sleep test > selftests/bpf: increase loop count for perf_branches > selfetest/bpf: make some tests serial > > tools/testing/selftests/bpf/cgroup_helpers.c | 6 +- > tools/testing/selftests/bpf/cgroup_helpers.h | 2 +- > .../selftests/bpf/prog_tests/atomics.c | 1 + > .../selftests/bpf/prog_tests/attach_probe.c | 8 +- > .../selftests/bpf/prog_tests/bpf_cookie.c | 10 +- > .../bpf/prog_tests/bpf_iter_setsockopt.c | 2 +- > .../selftests/bpf/prog_tests/bpf_obj_id.c | 2 +- > .../bpf/prog_tests/cg_storage_multi.c | 2 +- > .../bpf/prog_tests/cgroup_attach_autodetach.c | 2 +- > .../bpf/prog_tests/cgroup_attach_multi.c | 2 +- > .../bpf/prog_tests/cgroup_attach_override.c | 2 +- > .../selftests/bpf/prog_tests/cgroup_link.c | 2 +- > .../selftests/bpf/prog_tests/cgroup_v1v2.c | 2 +- > .../selftests/bpf/prog_tests/check_mtu.c | 2 +- > .../selftests/bpf/prog_tests/fexit_bpf2bpf.c | 3 +- > .../prog_tests/flow_dissector_load_bytes.c | 2 +- > .../bpf/prog_tests/flow_dissector_reattach.c | 2 +- > .../bpf/prog_tests/get_branch_snapshot.c | 2 +- > .../selftests/bpf/prog_tests/kfree_skb.c | 3 +- > .../bpf/prog_tests/migrate_reuseport.c | 2 +- > .../selftests/bpf/prog_tests/modify_return.c | 3 +- > .../bpf/prog_tests/ns_current_pid_tgid.c | 3 +- > .../selftests/bpf/prog_tests/perf_branches.c | 10 +- > .../selftests/bpf/prog_tests/perf_buffer.c | 2 +- > .../selftests/bpf/prog_tests/perf_link.c | 5 +- > .../selftests/bpf/prog_tests/probe_user.c | 3 +- > .../bpf/prog_tests/raw_tp_writable_test_run.c | 3 +- > .../bpf/prog_tests/select_reuseport.c | 2 +- > .../selftests/bpf/prog_tests/send_signal.c | 6 +- > .../bpf/prog_tests/send_signal_sched_switch.c | 3 +- > .../bpf/prog_tests/sk_storage_tracing.c | 2 +- > .../selftests/bpf/prog_tests/snprintf_btf.c | 2 +- > .../selftests/bpf/prog_tests/sock_fields.c | 2 +- > .../selftests/bpf/prog_tests/sockmap_listen.c | 2 +- > .../bpf/prog_tests/stacktrace_build_id_nmi.c | 19 +- > .../selftests/bpf/prog_tests/task_pt_regs.c | 8 +- > .../selftests/bpf/prog_tests/tc_redirect.c | 14 + > .../testing/selftests/bpf/prog_tests/timer.c | 3 +- > .../selftests/bpf/prog_tests/timer_mim.c | 2 +- > .../bpf/prog_tests/tp_attach_query.c | 2 +- > .../selftests/bpf/prog_tests/trace_printk.c | 2 +- > .../selftests/bpf/prog_tests/trace_vprintk.c | 2 +- > .../bpf/prog_tests/trampoline_count.c | 3 +- > .../selftests/bpf/prog_tests/xdp_attach.c | 2 +- > .../selftests/bpf/prog_tests/xdp_bonding.c | 2 +- > .../bpf/prog_tests/xdp_cpumap_attach.c | 2 +- > .../bpf/prog_tests/xdp_devmap_attach.c | 2 +- > .../selftests/bpf/prog_tests/xdp_info.c | 2 +- > .../selftests/bpf/prog_tests/xdp_link.c | 2 +- > tools/testing/selftests/bpf/progs/atomics.c | 16 + > .../selftests/bpf/progs/connect4_dropper.c | 2 +- > .../testing/selftests/bpf/progs/fexit_sleep.c | 4 +- > .../selftests/bpf/progs/test_enable_stats.c | 2 +- > tools/testing/selftests/bpf/test_progs.c | 671 +++++++++++++++++- > tools/testing/selftests/bpf/test_progs.h | 37 +- > 55 files changed, 790 insertions(+), 116 deletions(-) > > -- > 2.30.2 >