Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments

Yonghong Song <yonghong.song@xxxxxxxxx> · Fri, 8 Nov 2024 13:57:04 -0800

On 11/8/24 12:13 PM, Alexei Starovoitov wrote:
On Thu, Nov 7, 2024 at 9:08 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:

[...]

Hmm.
Puranjay touched it last with extra logic.

And before that David Vernet tried to address flakiness
in commit 4a54de65964d.
Yonghong also noticed lockups in paravirt
and added workaround 7015843afc.

Your additional timeout/workaround makes sense to me,
but would be good to bisect whether Puranjay's change caused it.
I'll debug what's going on some time later today or on Sat.
I finally had time to investigate this a bit.
First, here is how to trigger lockup:

   t1=send_signal/send_signal_perf_thread_remote; \
   t2=send_signal/send_signal_nmi_thread_remote; \
   for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done

Must be both tests for whatever reason.
The failing test is 'send_signal_nmi_thread_remote'.

The test is organized as parent and child processes communicating
various events to each other. The intended sequence of events:
- child:
   - install SIGUSR1 handler
   - notify parent
   - wait for parent
- parent:
   - open PERF_COUNT_SW_CPU_CLOCK event
   - attach BPF program to the event
   - notify child
   - enter busy loop for 10^8 iterations
   - wait for child
- BPF program:
   - send SIGUSR1 to child
- child:
   - poll for SIGUSR1 in a busy loop
   - notify parent
- parent:
   - check value communicated by child,
     terminate test.

The lockup happens because on every other test run perf event is not
triggered, child does not receive SIGUSR1 and thus both parent and
child are stuck.

For 'send_signal_nmi_thread_remote' perf event is defined as:

         struct perf_event_attr attr = {
                 .sample_period = 1,
                 .type = PERF_TYPE_HARDWARE,
                 .config = PERF_COUNT_HW_CPU_CYCLES,
         };

And is opened for parent process pid.

Apparently, the perf event is not always triggered between lines
send_signal.c:165-180. And at line 180 parent enters system call,
so cpu cycles stop ticking for 'parent', thus if perf event
had not been triggered already it won't be triggered at all
(as far as I understand).

Applying same fix as Yonghong did in 7015843afc is sufficient to
reliably trigger perf event:

--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
  static void test_send_signal_nmi(bool signal_thread, bool remote)
  {
         struct perf_event_attr attr = {
-               .sample_period = 1,
+               .freq = 1,
+               .sample_freq = 1000,
                 .type = PERF_TYPE_HARDWARE,
                 .config = PERF_COUNT_HW_CPU_CYCLES,
         };

But I don't understand why.
As far as I can figure from kernel source code,
sample_period is measured in nanoseconds (is it?),
I believe sample_period is a number of samples.
1 means that perf suppose to generate event very often.
It means nanoseconds only for SW cpu_cycles.

let's apply above workaround and move on. Pls send a patch.

sample_period = 1 intends to make sure we can get at one sample
since many samples will be generated. But too many samples may
cause *rare* issues in internal kernel logic in certain *corner*
cases. Agree with Alexei, let us just use a reasonable sample_freq
to fix the issue. Thanks!