Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 8 Nov 2024 12:13:57 -0800

On Thu, Nov 7, 2024 at 9:08 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
> > On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:
> >
> > [...]
> >
> > > Hmm.
> > > Puranjay touched it last with extra logic.
> > >
> > > And before that David Vernet tried to address flakiness
> > > in commit 4a54de65964d.
> > > Yonghong also noticed lockups in paravirt
> > > and added workaround 7015843afc.
> > >
> > > Your additional timeout/workaround makes sense to me,
> > > but would be good to bisect whether Puranjay's change caused it.
> >
> > I'll debug what's going on some time later today or on Sat.
>
> I finally had time to investigate this a bit.
> First, here is how to trigger lockup:
>
>   t1=send_signal/send_signal_perf_thread_remote; \
>   t2=send_signal/send_signal_nmi_thread_remote; \
>   for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done
>
> Must be both tests for whatever reason.
> The failing test is 'send_signal_nmi_thread_remote'.
>
> The test is organized as parent and child processes communicating
> various events to each other. The intended sequence of events:
> - child:
>   - install SIGUSR1 handler
>   - notify parent
>   - wait for parent
> - parent:
>   - open PERF_COUNT_SW_CPU_CLOCK event
>   - attach BPF program to the event
>   - notify child
>   - enter busy loop for 10^8 iterations
>   - wait for child
> - BPF program:
>   - send SIGUSR1 to child
> - child:
>   - poll for SIGUSR1 in a busy loop
>   - notify parent
> - parent:
>   - check value communicated by child,
>     terminate test.
>
> The lockup happens because on every other test run perf event is not
> triggered, child does not receive SIGUSR1 and thus both parent and
> child are stuck.
>
> For 'send_signal_nmi_thread_remote' perf event is defined as:
>
>         struct perf_event_attr attr = {
>                 .sample_period = 1,
>                 .type = PERF_TYPE_HARDWARE,
>                 .config = PERF_COUNT_HW_CPU_CYCLES,
>         };
>
> And is opened for parent process pid.
>
> Apparently, the perf event is not always triggered between lines
> send_signal.c:165-180. And at line 180 parent enters system call,
> so cpu cycles stop ticking for 'parent', thus if perf event
> had not been triggered already it won't be triggered at all
> (as far as I understand).
>
> Applying same fix as Yonghong did in 7015843afc is sufficient to
> reliably trigger perf event:
>
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
>  static void test_send_signal_nmi(bool signal_thread, bool remote)
>  {
>         struct perf_event_attr attr = {
> -               .sample_period = 1,
> +               .freq = 1,
> +               .sample_freq = 1000,
>                 .type = PERF_TYPE_HARDWARE,
>                 .config = PERF_COUNT_HW_CPU_CYCLES,
>         };
>
> But I don't understand why.
> As far as I can figure from kernel source code,
> sample_period is measured in nanoseconds (is it?),

I believe sample_period is a number of samples.
1 means that perf suppose to generate event very often.
It means nanoseconds only for SW cpu_cycles.

let's apply above workaround and move on. Pls send a patch.