Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 13 Apr 2020 14:04:58 -0700

On Sat, Apr 11, 2020 at 4:17 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Apr 10, 2020 at 05:23:30PM -0700, Yonghong Song wrote:
> > >
> > > So it seems like few things would be useful:
> > >
> > > 1. end flag for post-aggregation and/or footer printing (seq_num == 0
> > > is providing similar means for start flag).
> >
> > the end flag is a problem. We could say hijack next or stop so we
> > can detect the end, but passing a NULL pointer as the object
> > to the bpf program may be problematic without verifier enforcement
> > as it may cause a lot of exceptions... Although all these exception
> > will be silenced by bpf infra, but still not sure whether this
> > is acceptable or not.
>
> I don't like passing NULL there just to indicate something to a program.
> It's not too horrible to support from verifier side, but NULL is only
> one such flag. What does it suppose to indicate? That dumper prog
> is just starting? or ending? Let's pass (void*)1, and (void *)2 ?
> I'm not a fan of such inband signaling.
> imo it's cleaner and simpler when that object pointer is always valid.

I'm not proposing to pass fake pointers. I proposed to have bpfdump
context instead. E.g., one way to do this would be something like:

struct bpf_dump_context {
  struct seq_file *seq;
  u64 seq_num;
  int flags; /* 0 | BPF_DUMP_START | BPF_DUMP_END */
};

int prog(struct bpf_dump_context *ctx, struct netlink_sock *sk) {
  if (ctx->flags & BPF_DUMP_END) {
    /* emit summary */
    return 0;
  }

  /* sk must be not null here. */
}

This is one way. We can make it simpler by saying that sk == NULL is
always end of aggregation for given seq_file, then we won't need flags
and will require `if (!sk)` check explicitly. Don't know what's the
best way, but what I'm advocating for is to have a way for BPF program
to know that processing is finished and it's time to emit summary. See
my other reply in this thread with example use cases.

>
> > > 2. Some sort of "session id", so that bpfdumper can maintain
> > > per-session intermediate state. Plus with this it would be possible to
> > > detect restarts (if there is some state for the same session and
> > > seq_num == 0, this is restart).
> >
> > I guess we can do this.
>
> beyond seq_num passing session_id is a good idea. Though I don't quite see
> the use case where you'd need bpfdumper prog to be stateful, but doesn't hurt.

State per session seems most useful, so session id + hashmap solves
it. If we do sk_local storage per seq_file, that might be enough as
well, I guess...

Examples are any kind of summary stats across all sockets/tasks/etc.

Another interesting use case: produce map from process ID (tgid) to
bpf_maps, bpf_progs, bpf_links (or sockets, or whatever kind of file
we need). You'd need FD/file -> kernel object map and then kernel
object -> tgid map. I think there are many useful use-cases beyond
"one line per object" output cases that inspired bpfdump in the first
place.