Re: [PATCH] tracing/user_events: Run BPF program if attached

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 6 Jun 2023 09:57:14 -0700

On Tue, Jun 6, 2023 at 6:57 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, 16 May 2023 17:36:28 -0700
> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>
> > BPF progs have three ways to access kernel tracepoints:
> > 1. traditional tracepoint
>
> This is the trace_events, which is used by ftrace, right?
>
> > 2. raw tracepoint
> > 3. raw tracepoint with BTF
> >
> > 1 was added first and now rarely used (only by old tools), since it's slow.
> > 2 was added later to address performance concerns.
> > 3 was added after BTF was introduced to provide accurate types.
> >
> > 3 is the only one that bpf community recommends and is the one that is used most often.
> >
> > As far as I know trace_events were never connected to bpf.
> > Unless somebody sneaked the code in without us seeing it.
>
> With this design, I understand that you may not want to connect BPF
> directly to user_events. It needs a different model.
>
> >
> > I think you're trying to model user_events+bpf as 1.
> > Which means that you'll be repeating the same mistakes.
>
> The user_events is completely different from the traceppoint and
> must have no BTF with it.
> Also, all information must be sent in the user-written data packet.
> (No data structure, event if there is a structure, it must be fully
> contained in the packet.)
>
> For the tracepoint, there is a function call with some values or
> pointers of data structure. So it is meaningful to skip using the
> traceevent (which converts all pointers to actual field values of
> the data structure and store it to ftrace buffer) because most of
> the values can be ignored in the BPF prog.
>
> However, for the user_events, the data is just passed from the
> user as a data packet, and BPF prog can access to the data packet
> (to avoid accessing malicious data, data validator can not be
> skipped). So this seems like 1. but actually you can access to
> the validated data on perf buffer. Maybe we can allow BPF to
> hook the write syscall and access user-space data, but it may
> not safe. I think this is the safest way to do that.

I'm trying to understand why we need a new kernel concept for all
this. It looks like we are just creating a poor man's
publisher/subscriber solution in the kernel, but mostly intend to use
it from user-space? Why not just use Unix domain sockets for this,
though? Use SOCK_SEQPACKET, put "event data" into a single packet
that's guaranteed to not be broken up. Expose this to other processes
through named pipes, if necessary.

Sorry if it's naive questions, but it's not clear what problem
user_events are solving and why we need a new thing and can't use
existing kernel primitives?

>
> Thank you,
>
> --
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>