Re: [BUG] possible deadlock in __schedule (with reproducer available)

Hillf Danton <hdanton@xxxxxxxx> · Thu, 28 Nov 2024 07:03:49 +0800



On Tue, 26 Nov 2024 13:15:48 -0800 Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> On Mon, Nov 25, 2024 at 1:44 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Mon, Nov 25, 2024 at 05:24:05AM +0000, Ruan Bonan wrote:
> >
> > > From the discussion, it appears that the root cause might involve
> > > specific printk or BPF operations in the given context. To clarify and
> > > possibly avoid similar issues in the future, are there guidelines or
> > > best practices for writing BPF programs/hooks that interact with
> > > tracepoints, especially those related to scheduler events, to prevent
> > > such deadlocks?
> >
> > The general guideline and recommendation for all tracepoints is to be
> > wait-free. Typically all tracer code should be.
> >
> > Now, BPF (users) (ab)uses tracepoints to do all sorts and takes certain
> > liberties with them, but it is very much at the discretion of the BPF
> > user.
> 
> We do assume that tracepoints are just like kprobes and can run in
> NMI. And in this case BPF is just a vehicle to trigger a
> promised-to-be-wait-free strncpy_from_user_nofault(). That's as far as
> BPF involvement goes, we should stop discussing BPF in this context,
> it's misleading.
> 
Given known issue, syzbot should run without bpf enabled before it is fixed
to avoid more useless discussing and misleading.