Re: [RFC bpf-next] bpf: Use prog->active instead of bpf_prog_active for kprobe_multi

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 9 Jun 2022 11:26:57 -0700

On Tue, Jun 7, 2022 at 9:29 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, May 31, 2022 at 4:24 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Wed, May 25, 2022 at 4:40 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > >
> > > hi,
> > > Alexei suggested to use prog->active instead global bpf_prog_active
> > > for programs attached with kprobe multi [1].
> > >
> > > AFAICS this will bypass bpf_disable_instrumentation, which seems to be
> > > ok for some places like hash map update, but I'm not sure about other
> > > places, hence this is RFC post.
> > >
> > > I'm not sure how are kprobes different to trampolines in this regard,
> > > because trampolines use prog->active and it's not a problem.
> > >
> > > thoughts?
> > >
> >
> > Let's say we have two kernel functions A and B? B can be called from
> > BPF program though some BPF helper, ok? Now let's say I have two BPF
> > programs kprobeX and kretprobeX, both are attached to A and B. With
> > using prog->active instead of per-cpu bpf_prog_active, what would be
> > the behavior when A is called somewhere in the kernel.
> >
> > 1. A is called
> > 2. kprobeX is activated for A, calls some helper which eventually calls B
> >   3. kprobeX is attempted to be called for B, but is skipped due to prog->active
> >   4. B runs
> >   5. kretprobeX is activated for B, calls some helper which eventually calls B
> >     6. kprobeX is ignored (prog->active > 0)
> >     7. B runs
> >     8. kretprobeX is ignored (prog->active > 0)
> > 9. kretprobeX is activated for A, calls helper which calls B
> >   10. kprobeX is activated for B
> >     11. kprobeX is ignored (prog->active > 0)
>
> not correct. kprobeX actually runs.
> but the end result is correct.
>

Right, it was a long sequence, but you got the idea :)

> >     12. B runs
> >     13. kretprobeX is ignored (prog->active > 0)
> >   14. B runs
> >   15. kretprobeX is ignored (prog->active > 0)
> >
> >
> > If that's correct, we get:
> >
> > 1. kprobeX for A
> > 2. kretprobeX for B
> > 3. kretprobeX for A
> > 4. kprobeX for B
>
> Here it's correct.
>
> > It's quite mind-boggling and annoying in practice. I'd very much
> > prefer just kprobeX for A followed by kretprobeX for A. That's it.
> >
> > I'm trying to protect against this in retsnoop with custom per-cpu
> > logic in each program, but I so much more prefer bpf_prog_active,
> > which basically says "no nested kprobe calls while kprobe program is
> > running", which makes a lot of sense in practice.
>
> It makes sense for retsnoop, but does not make sense in general.
>
> > Given kprobe already used global bpf_prog_active I'd say multi-kprobe
> > should stick to bpf_prog_active as well.
>
> I strongly disagree.
> Both multi kprobe and kprobe should move to per prog counter
> plus some other protection
> (we cannot just move to per-prog due to syscalls).
> It's true that the above order is mind-boggling,
> but it's much better than
> missing kprobe invocation completely just because
> another kprobe is running on the same cpu.
> People complained numerous times about this kprobe behavior.
> kprobeX attached to A
> kprobeY attached to B.
> If kprobeX calls B kprobeY is not going to be called.
> Means that anything that bpf is using is lost.
> spin locks, lists, rcu, etc.
> Sleepable uprobes are coming.
> iirc Delyan's patch correctly.
> We will do migrate_disable and inc bpf_prog_active.

This might be a different issue, I'm not sure why uprobes should be
protected by the same global bpf_prog_active, you can't trigger uprobe
from uprobe program. And especially for sleepable programs it makes no
sense to use per-CPU protection (we have bpf_run_ctx for such
protections, if needed).

> Now random kprobes on that cpu will be lost.

It's not random. The rule is you can't kernel functions and
tracepoints triggered from BPF kprobes/tracepoints. This prevents
nasty reentrance problems and makes sense. Isn't kernel tracing infra
is protecting itself similarly, preventing reentrancy and recursion?

> It's awful. We have to fix it.

You can call it "a fix" if you'd like, but it's changing a very
user-visible behavior and guarantees on which users relied for a
while. So even if we switch to per-prog protection it will have to be
some sort of opt-in (flag, new program type, whatever).