On Fri, Mar 10, 2023 at 12:10:48AM -0600, David Vernet wrote: > The send_signal tracepoint tests are non-deterministically failing in > CI. The test works as follows: > > 1. Two pairs of file descriptors are created using the pipe() function. > One pair is used to communicate between a parent process -> child > process, and the other for the reverse direction. > > 2. A child is fork()'ed. The child process registers a signal handler, > notifies its parent that the signal handler is registered, and then > and waits for its parent to have enabled a BPF program that sends a > signal. > > 3. The parent opens and loads a BPF skeleton with programs that send > signals to the child process. The different programs are triggered by > different perf events (either NMI or normal perf), or by regular > tracepoints. The signal is delivered to the child whenever the child > triggers the program. > > 4. The child's signal handler is invoked, which sets a flag saying that > the signal handler was reached. The child then signals to the parent > that it received the signal, and the test ends. > > The perf testcases (send_signal_perf{_thread} and > send_signal_nmi{_thread}) work 100% of the time, but the tracepoint > testcases fail non-deterministically because the tracepoint is not > always being fired for the child. > > There are two tracepoint programs registered in the test: > 'tracepoint/sched/sched_switch', and > 'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally > blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered. > To fix this, we can have the child trigger the nanosleep program with a > usleep(). > > Before this patch, the test would fail locally every 2-3 runs. Now, it > doesn't fail after more than 1000 runs. > > Signed-off-by: David Vernet <void@xxxxxxxxxxxxx> > --- > tools/testing/selftests/bpf/prog_tests/send_signal.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c > index d63a20fbed33..61cc83fca53c 100644 > --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c > +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c > @@ -64,8 +64,11 @@ static void test_send_signal_common(struct perf_event_attr *attr, > ASSERT_EQ(read(pipe_p2c[0], buf, 1), 1, "pipe_read"); > > /* wait a little for signal handler */ > - for (int i = 0; i < 1000000000 && !sigusr1_received; i++) > + for (int i = 0; i < 1000000000 && !sigusr1_received; i++) { > j /= i + j + 1; > + if (!attr) > + ASSERT_EQ(usleep(1), 0, "nanosleep_tp"); As soon as I sent this out, it occurred to me that having an ASSERT_EQ like this is not a good idea. usleep() could be interrupted by a signal and return EINTR, and the whole point of this test is to send signals to the child. Let me resend this as v2 without the ASSERT_EQ. > + } > > buf[0] = sigusr1_received ? '2' : '0'; > ASSERT_EQ(sigusr1_received, 1, "sigusr1_received"); > -- > 2.39.0 >