Re: [PATCH v5 bpf-next 1/3] selftests/bpf: Add parallelism to test_progs

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 17 Sep 2021 11:54:35 -0700

On Fri, Sep 17, 2021 at 11:43 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Wed, Sep 15, 2021 at 8:26 PM Yucong Sun <fallentree@xxxxxx> wrote:
> >
> > From: Yucong Sun <sunyucong@xxxxxxxxx>
> >
> > This patch adds "-j" mode to test_progs, executing tests in multiple process.
> > "-j" mode is optional, and works with all existing test selection mechanism, as
> > well as "-v", "-l" etc.
> >
> > In "-j" mode, main process use UDS to communicate to each forked worker,
> > commanding it to run tests and collect logs. After all tests are
> > finished, a summary is printed. main process use multiple competing
> > threads to dispatch work to worker, trying to keep them all busy.
> >
> > The test status will be printed as soon as it is finished, if there are error
> > logs, it will be printed after the final summary line.
> >
> > By specifying "--debug", additional debug information on server/worker
> > communication will be printed.
> >
> > Example output:
> >   > ./test_progs -n 15-20 -j
> >   [   12.801730] bpf_testmod: loading out-of-tree module taints kernel.
> >   Launching 8 workers.
> >   #20 btf_split:OK
> >   #16 btf_endian:OK
> >   #18 btf_module:OK
> >   #17 btf_map_in_map:OK
> >   #19 btf_skc_cls_ingress:OK
> >   #15 btf_dump:OK
> >   Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED
> >
> > Signed-off-by: Yucong Sun <sunyucong@xxxxxxxxx>
> > ---
>
> A bit late to review this, sorry. I'm still looking through the code,
> but decided to try it out locally first. And here's what I got
> immediately running in QEMU:
>
> [vmuser@archvm bpf]$ time sudo ./test_progs -t core
> #32 core_autosize:OK
> #33 core_extern:OK
> #34 core_read_macros:OK
> #35 core_reloc:OK
> #36 core_retro:OK
> Summary: 5/107 PASSED, 0 SKIPPED, 0 FAILED
>
> real    0m0.927s
> user    0m0.197s
> sys     0m0.103s
> [vmuser@archvm bpf]$ time sudo ./test_progs -t core -j
> Launching 8 workers.
> #34 core_read_macros:OK
> #32 core_autosize:OK
> #36 core_retro:OK
> #33 core_extern:OK
> #35 core_reloc:OK
> Summary: 5/107 PASSED, 0 SKIPPED, 0 FAILED
>
> real    0m20.048s
> user    0m0.194s
> sys     0m0.183s
>
>
> So, first, "Launching 8 workers." should be only displayed with --debug, no?
>
> But most importantly, why does the parallel version take 20 seconds?..
> Please take a look, something is not right.

Running full run:

Summary: 176/972 PASSED, 4 SKIPPED, 2 FAILED

All error logs:
test_cg_storage_multi:PASS:cg-create-parent 0 nsec
test_cg_storage_multi:PASS:cg-create-child 0 nsec
test_egress_only:PASS:skel-load 0 nsec
test_egress_only:PASS:parent-cg-attach 0 nsec
test_egress_only:PASS:first-connect-send 0 nsec
test_egress_only:PASS:first-invoke 0 nsec
assert_storage:PASS:map-lookup 0 nsec
assert_storage:PASS:assert-storage 0 nsec
assert_storage_noexist:PASS:map-lookup 0 nsec
assert_storage_noexist:PASS:map-lookup 0 nsec
test_egress_only:PASS:child-cg-attach 0 nsec
test_egress_only:PASS:second-connect-send 0 nsec
test_egress_only:PASS:second-invoke 0 nsec
assert_storage:PASS:map-lookup 0 nsec
assert_storage:PASS:assert-storage 0 nsec
assert_storage:PASS:map-lookup 0 nsec
assert_storage:PASS:assert-storage 0 nsec
#28/1 cg_storage_multi/egress_only:OK
test_isolated:PASS:skel-load 0 nsec
test_isolated:PASS:parent-egress1-cg-attach 0 nsec
test_isolated:PASS:parent-egress2-cg-attach 0 nsec
test_isolated:PASS:parent-ingress-cg-attach 0 nsec
test_isolated:PASS:first-connect-send 0 nsec
test_isolated:FAIL:first-invoke invocations=2#28/2
cg_storage_multi/isolated:FAIL
test_shared:PASS:skel-load 0 nsec
test_shared:PASS:parent-egress1-cg-attach 0 nsec
test_shared:PASS:parent-egress2-cg-attach 0 nsec
test_shared:PASS:parent-ingress-cg-attach 0 nsec
test_shared:PASS:first-connect-send 0 nsec
test_shared:PASS:first-invoke 0 nsec
assert_storage:PASS:map-lookup 0 nsec
assert_storage:PASS:assert-storage 0 nsec
assert_storage_noexist:PASS:map-lookup 0 nsec
assert_storage_noexist:PASS:map-lookup 0 nsec
test_shared:PASS:child-egress1-cg-attach 0 nsec
test_shared:PASS:child-egress2-cg-attach 0 nsec
test_shared:PASS:child-ingress-cg-attach 0 nsec
test_shared:PASS:second-connect-send 0 nsec
test_shared:PASS:second-invoke 0 nsec
assert_storage:PASS:map-lookup 0 nsec
assert_storage:PASS:assert-storage 0 nsec
assert_storage:PASS:map-lookup 0 nsec
assert_storage:PASS:assert-storage 0 nsec
#28/3 cg_storage_multi/shared:OK
#28 cg_storage_multi:FAIL

real    1m0.057s
user    0m4.167s
sys     0m40.824s

Running in sequential mode, I got this timing:

real    1m10.007s
user    0m3.910s
sys     0m37.004s

So not much speed up, unfortunately. I assumed it's bpf_verif_scale
test (which we will break up to parallelize it better). So I re-ran
with it blacklisted, let's see how it did:

Summary: 175/949 PASSED, 4 SKIPPED, 1 FAILED

All error logs:
libbpf: elf: skipping unrecognized data section(7) .rodata.str1.1
test_snprintf_btf:PASS:skel_open 0 nsec
test_snprintf_btf:PASS:skel_load 0 nsec
test_snprintf_btf:PASS:skel_attach 0 nsec
test_snprintf_btf:PASS:system 0 nsec
test_snprintf_btf:PASS:bpf_snprintf_ret 0 nsec
test_snprintf_btf:PASS:check if subtests ran 0 nsec
test_snprintf_btf:FAIL:check all subtests ran only ran 1535 of 1431 tests
#116 snprintf_btf:FAIL

real    0m40.041s
user    0m4.212s
sys     0m7.966s

Note how cg_storage_multi didn't fail this time, but snprintf_btf did.
I think we'll need to audit selftests some more to see which ones are
not filtering by pid and make some system-wide modifications, those
would need to be adapted or marked as serial. We probably don't need
to do it in this first patch set, but definitely would have to do it
before we can start using this parallel mode for real both for local
development and in CI. For now let's concentrate on runner's
correctness (e.g., those 20 seconds don't seem right at all).

But getting back to my comparison, running all but bpf_verif_scale
test sequentially:

Summary: 176/949 PASSED, 5 SKIPPED, 0 FAILED

real    0m39.343s
user    0m3.857s
sys     0m6.509s

It is actually faster to run sequentially... Any idea what might be
going on here?

>
> >  tools/testing/selftests/bpf/test_progs.c | 577 +++++++++++++++++++++--
> >  tools/testing/selftests/bpf/test_progs.h |  36 +-
> >  2 files changed, 581 insertions(+), 32 deletions(-)
> >
>
> [...]