On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote: > On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote: > > [...] > > > Hmm. > > Puranjay touched it last with extra logic. > > > > And before that David Vernet tried to address flakiness > > in commit 4a54de65964d. > > Yonghong also noticed lockups in paravirt > > and added workaround 7015843afc. > > > > Your additional timeout/workaround makes sense to me, > > but would be good to bisect whether Puranjay's change caused it. > > I'll debug what's going on some time later today or on Sat. I finally had time to investigate this a bit. First, here is how to trigger lockup: t1=send_signal/send_signal_perf_thread_remote; \ t2=send_signal/send_signal_nmi_thread_remote; \ for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done Must be both tests for whatever reason. The failing test is 'send_signal_nmi_thread_remote'. The test is organized as parent and child processes communicating various events to each other. The intended sequence of events: - child: - install SIGUSR1 handler - notify parent - wait for parent - parent: - open PERF_COUNT_SW_CPU_CLOCK event - attach BPF program to the event - notify child - enter busy loop for 10^8 iterations - wait for child - BPF program: - send SIGUSR1 to child - child: - poll for SIGUSR1 in a busy loop - notify parent - parent: - check value communicated by child, terminate test. The lockup happens because on every other test run perf event is not triggered, child does not receive SIGUSR1 and thus both parent and child are stuck. For 'send_signal_nmi_thread_remote' perf event is defined as: struct perf_event_attr attr = { .sample_period = 1, .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, }; And is opened for parent process pid. Apparently, the perf event is not always triggered between lines send_signal.c:165-180. And at line 180 parent enters system call, so cpu cycles stop ticking for 'parent', thus if perf event had not been triggered already it won't be triggered at all (as far as I understand). Applying same fix as Yonghong did in 7015843afc is sufficient to reliably trigger perf event: --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c @@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote) static void test_send_signal_nmi(bool signal_thread, bool remote) { struct perf_event_attr attr = { - .sample_period = 1, + .freq = 1, + .sample_freq = 1000, .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, }; But I don't understand why. As far as I can figure from kernel source code, sample_period is measured in nanoseconds (is it?), so busy loop at send_signal.c:174-175 should run long enough for perf event to be triggered before. Can someone with understanding of how perf event work explain why above change helps?