Re: [PATCH] tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 8 Jul 2021 13:11:59 -0700

On Thu, Jul 8, 2021 at 10:30 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> ----- On Jul 7, 2021, at 8:23 PM, Andrii Nakryiko andrii.nakryiko@xxxxxxxxx wrote:
>
> > On Wed, Jul 7, 2021 at 5:05 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >>
> >> On Wed, 7 Jul 2021 16:49:26 -0700
> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >>
> >> > As for why the user might need that, it's up to the user and I don't
> >> > want to speculate because it will always sound contrived without a
> >> > specific production use case. But people are very creative and we try
> >> > not to dictate how and what can be done if it doesn't break any
> >> > fundamental assumption and safety.
> >>
> >> I guess it doesn't matter, because if they try to do it, the second
> >> attachment will simply fail to attach.
> >>
> >
> > But not for the kprobe case.
> >
> > And it might not always be possible to know that the same BPF program
> > is being attached. It could be attached by different processes that
> > re-use pinned program (without being aware of each other). Or it could
> > be done from some generic library that just accepts prog_fd and
> > doesn't really know the exact BPF program and whether it was already
> > attached.
> >
> > Not sure why it doesn't matter that attachment will fail where it is
> > expected to succeed. The question is rather why such restriction?
>
> Before eBPF came to exist, all in-kernel users of the tracepoint API never
> required multiple registrations for a given (tracepoint, probe, data) tuple.
>
> This allowed us to expose an API which can consider that the (tracepoint, probe, data)
> tuple is unique for each registration/unregistration pair, and therefore use that same
> tuple for unregistration. Refusing multiple registrations for a given tuple allows us to
> forgo the complexity of reference counting for duplicate registrations, and provide
> immediate feedback to misbehaving tracers which have duplicate registration or
> unbalanced registration/unregistration pairs.
>
> From the perspective of a ring buffer tracer, the notion of multiple instances of
> a given (tracepoint, probe, data) tuple is rather silly: it would mean that a given
> tracepoint hit would generate many instances of the exact same event into the
> same trace buffer.
>
> AFAIR, having the WARN_ON_ONCE() within the tracepoint code to highlight this kind of misuse
> allowed Steven to find a few unbalanced registration/unregistration issues while developing
> ftrace in the past. I vaguely recall that it triggered for blktrace at some point as well.
>
> Considering that allowing duplicates would add complexity to the tracepoint code,
> what is the use-case justifying allowing many instances of the exact same callback
> and data for a given tracepoint ?

It wasn't clear to me if supporting this would cause any added
complexity, which is why I asked.

>
> One key difference I notice here between eBPF and ring buffer tracers is what eBPF
> considers a "program". AFAIU (please let me know if I'm mistaken), the "callback"
> argument provided by eBPF to the tracepoint API is a limited set of trampoline routines.
> The bulk of the eBPF "program" is provided in the "data" argument. So this means the
> "program" is both the eBPF code and some context.
>
> So I understand that a given eBPF code could be loaded more than once for a given

No, it turns out it can't, I was just surprised to learn that.
Surprised, because AFAIK we don't have such restrictions on uniqueness
of attached BPF programs anywhere else where multiple BPF programs are
allowed.

> tracepoint, but I would expect that each registration on a given tracepoint be
> provided with its own "context", otherwise we end up in a similar situation as the
> ring buffer's duplicated events scenario I explained above.
>
> Also, we should discuss whether kprobes might benefit from being more strict by
> rejecting duplicated (instrumentation site, probe, data) tuples.
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com