On Tue, Feb 22, 2022 at 12:35 PM Mykola Lysenko <mykolal@xxxxxx> wrote: > > Thanks for the review Andrii! > > > On Feb 19, 2022, at 8:39 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Fri, Feb 18, 2022 at 4:30 PM Mykola Lysenko <mykolal@xxxxxx> wrote: > >> > >> In send_signal, replace sleep with dummy cpu intensive computation > >> to increase probability of child process being scheduled. Add few > >> more asserts. > >> > >> In find_vma, reduce sample_freq as higher values may be rejected in > >> some qemu setups, remove usleep and increase length of cpu intensive > >> computation. > >> > >> In bpf_cookie, perf_link and perf_branches, reduce sample_freq as > >> higher values may be rejected in some qemu setups > >> > >> Signed-off-by: Mykola Lysenko <mykolal@xxxxxx> > >> --- > >> .../testing/selftests/bpf/prog_tests/bpf_cookie.c | 2 +- > >> tools/testing/selftests/bpf/prog_tests/find_vma.c | 5 ++--- > >> .../selftests/bpf/prog_tests/perf_branches.c | 4 ++-- > >> tools/testing/selftests/bpf/prog_tests/perf_link.c | 2 +- > >> .../testing/selftests/bpf/prog_tests/send_signal.c | 14 ++++++++++---- > >> 5 files changed, 16 insertions(+), 11 deletions(-) > >> > >> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c > >> index cd10df6cd0fc..0612e79a9281 100644 > >> --- a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c > >> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c > >> @@ -199,7 +199,7 @@ static void pe_subtest(struct test_bpf_cookie *skel) > >> attr.type = PERF_TYPE_SOFTWARE; > >> attr.config = PERF_COUNT_SW_CPU_CLOCK; > >> attr.freq = 1; > >> - attr.sample_freq = 4000; > >> + attr.sample_freq = 1000; > >> pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); > >> if (!ASSERT_GE(pfd, 0, "perf_fd")) > >> goto cleanup; > >> diff --git a/tools/testing/selftests/bpf/prog_tests/find_vma.c b/tools/testing/selftests/bpf/prog_tests/find_vma.c > >> index b74b3c0c555a..acc41223a112 100644 > >> --- a/tools/testing/selftests/bpf/prog_tests/find_vma.c > >> +++ b/tools/testing/selftests/bpf/prog_tests/find_vma.c > >> @@ -30,7 +30,7 @@ static int open_pe(void) > >> attr.type = PERF_TYPE_HARDWARE; > >> attr.config = PERF_COUNT_HW_CPU_CYCLES; > >> attr.freq = 1; > >> - attr.sample_freq = 4000; > >> + attr.sample_freq = 1000; > >> pfd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, PERF_FLAG_FD_CLOEXEC); > >> > >> return pfd >= 0 ? pfd : -errno; > >> @@ -57,7 +57,7 @@ static void test_find_vma_pe(struct find_vma *skel) > >> if (!ASSERT_OK_PTR(link, "attach_perf_event")) > >> goto cleanup; > >> > >> - for (i = 0; i < 1000000; ++i) > >> + for (i = 0; i < 1000000000; ++i) > > > > 1bln seems excessive... maybe 10mln would be enough? > > See explanation for send_signal test case below > > > > >> ++j; > >> > >> test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */); > > > > [...] > > > >> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c > >> index 776916b61c40..841217bd1df6 100644 > >> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c > >> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c > >> @@ -4,11 +4,12 @@ > >> #include <sys/resource.h> > >> #include "test_send_signal_kern.skel.h" > >> > >> -int sigusr1_received = 0; > >> +int sigusr1_received; > >> +volatile int volatile_variable; > > > > please make them static > > sure > > > > >> > >> static void sigusr1_handler(int signum) > >> { > >> - sigusr1_received++; > >> + sigusr1_received = 1; > >> } > >> > >> static void test_send_signal_common(struct perf_event_attr *attr, > >> @@ -42,7 +43,9 @@ static void test_send_signal_common(struct perf_event_attr *attr, > >> int old_prio; > >> > >> /* install signal handler and notify parent */ > >> + errno = 0; > >> signal(SIGUSR1, sigusr1_handler); > >> + ASSERT_OK(errno, "signal"); > > > > just ASSERT_OK(signal(...), "signal"); > > I am fine to merge signal and ASSERT lines, but will substitute with condition "signal(SIGUSR1, sigusr1_handler) != SIG_ERR”, sounds good? > Ah, signal is a bit special with return values. Yeah, ASSERT_NEQ(signal(...), SIG_ERR, "signal") sounds good. > > > >> > >> close(pipe_c2p[0]); /* close read */ > >> close(pipe_p2c[1]); /* close write */ > >> @@ -63,9 +66,12 @@ static void test_send_signal_common(struct perf_event_attr *attr, > >> ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); > >> > >> /* wait a little for signal handler */ > >> - sleep(1); > >> + for (int i = 0; i < 1000000000; i++) > > > > same about 1bln > > With 10mln and 100 test runs I got 86 failures > 100mln - 63 failures > 1bln - 0 failures on 100 runs > > Now, there is performance concern for this test. Running > > time sudo ./test_progs -t send_signal/send_signal_nmi_thread > > With 1bln takes ~4s > 100mln - 1s. > Unchanged test with sleep(1); takes ~2s. > > On the other hand 300mln runs ~2s, and only fails 1 time per 100 runs. As 300mln does not regress performance comparing to the current “sleep(1)” implementation, I propose to go with it. What do you think? I think if we need to burn multiple seconds of CPU to make the test reliable, then we should either rework or disable/remove the test. In CI those billions of iterations will be much slower. And even waiting for 4 seconds for just one test is painful. Yonghong, WDYT? Should we just drop thi test? It has caused us a bunch of flakiness and maintenance burden without actually catching any issues. Maybe it's better to just get rid of it? > > > > >> + volatile_variable++; > >> > >> buf[0] = sigusr1_received ? '2' : '0'; > >> + ASSERT_EQ(sigusr1_received, 1, "sigusr1_received"); > >> + > >> ASSERT_EQ(write(pipe_c2p[1], buf, 1), 1, "pipe_write"); > >> > >> /* wait for parent notification and exit */ > >> @@ -110,9 +116,9 @@ static void test_send_signal_common(struct perf_event_attr *attr, > >> ASSERT_EQ(read(pipe_c2p[0], buf, 1), 1, "pipe_read"); > >> > >> /* trigger the bpf send_signal */ > >> + skel->bss->signal_thread = signal_thread; > >> skel->bss->pid = pid; > >> skel->bss->sig = SIGUSR1; > >> - skel->bss->signal_thread = signal_thread; > >> > >> /* notify child that bpf program can send_signal now */ > >> ASSERT_EQ(write(pipe_p2c[1], buf, 1), 1, "pipe_write"); > >> -- > >> 2.30.2 >