Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 07 Nov 2024 21:08:41 -0800

On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
> On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > Hmm.
> > Puranjay touched it last with extra logic.
> >
> > And before that David Vernet tried to address flakiness
> > in commit 4a54de65964d.
> > Yonghong also noticed lockups in paravirt
> > and added workaround 7015843afc.
> >
> > Your additional timeout/workaround makes sense to me,
> > but would be good to bisect whether Puranjay's change caused it.
>
> I'll debug what's going on some time later today or on Sat.

I finally had time to investigate this a bit.
First, here is how to trigger lockup:

  t1=send_signal/send_signal_perf_thread_remote; \
  t2=send_signal/send_signal_nmi_thread_remote; \
  for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done

Must be both tests for whatever reason.
The failing test is 'send_signal_nmi_thread_remote'.

The test is organized as parent and child processes communicating
various events to each other. The intended sequence of events:
- child:
  - install SIGUSR1 handler
  - notify parent
  - wait for parent
- parent:
  - open PERF_COUNT_SW_CPU_CLOCK event
  - attach BPF program to the event
  - notify child
  - enter busy loop for 10^8 iterations
  - wait for child
- BPF program:
  - send SIGUSR1 to child
- child:
  - poll for SIGUSR1 in a busy loop
  - notify parent
- parent:
  - check value communicated by child,
    terminate test.

The lockup happens because on every other test run perf event is not
triggered, child does not receive SIGUSR1 and thus both parent and
child are stuck.

For 'send_signal_nmi_thread_remote' perf event is defined as:

	struct perf_event_attr attr = {
		.sample_period = 1,
		.type = PERF_TYPE_HARDWARE,
		.config = PERF_COUNT_HW_CPU_CYCLES,
	};

And is opened for parent process pid.

Apparently, the perf event is not always triggered between lines
send_signal.c:165-180. And at line 180 parent enters system call,
so cpu cycles stop ticking for 'parent', thus if perf event
had not been triggered already it won't be triggered at all
(as far as I understand).

Applying same fix as Yonghong did in 7015843afc is sufficient to
reliably trigger perf event:

--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
 static void test_send_signal_nmi(bool signal_thread, bool remote)
 {
        struct perf_event_attr attr = {
-               .sample_period = 1,
+               .freq = 1,
+               .sample_freq = 1000,
                .type = PERF_TYPE_HARDWARE,
                .config = PERF_COUNT_HW_CPU_CYCLES,
        };

But I don't understand why.
As far as I can figure from kernel source code,
sample_period is measured in nanoseconds (is it?),
so busy loop at send_signal.c:174-175 should run long enough for perf
event to be triggered before.

Can someone with understanding of how perf event work explain why
above change helps?