Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers

Yonghong Song <yhs@xxxxxx> · Fri, 10 Apr 2020 17:23:30 -0700

On 4/10/20 4:25 PM, Andrii Nakryiko wrote:
On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@xxxxxx> wrote:

Given a loaded dumper bpf program, which already
knows which target it should bind to, there
two ways to create a dumper:
   - a file based dumper under hierarchy of
     /sys/kernel/bpfdump/ which uses can
     "cat" to print out the output.
   - an anonymous dumper which user application
     can "read" the dumping output.

For file based dumper, BPF_OBJ_PIN syscall interface
is used. For anonymous dumper, BPF_PROG_ATTACH
syscall interface is used.

To facilitate target seq_ops->show() to get the
bpf program easily, dumper creation increased
the target-provided seq_file private data size
so bpf program pointer is also stored in seq_file
private data.

Further, a seq_num which represents how many
bpf_dump_get_prog() has been called is also
available to the target seq_ops->show().
Such information can be used to e.g., print
banner before printing out actual data.

So I looked up seq_operations struct and did a very cursory read of
fs/seq_file.c and seq_file documentation, so I might be completely off
here.

start() is called before iteration begins, stop() is called after
iteration ends. Would it be a bit better and user-friendly interface
to have to extra calls to BPF program, say with NULL input element,
but with extra enum/flag that specifies that this is a START or END of
iteration, in addition to seq_num?

The current design always pass a valid object (task, file, netlink_sock,
fib6_info). That is, access to fields to those data structure won't 
cause runtime exceptions.

Therefore, with the existing seq_ops implementation for ipv6_route
and netlink, etc, we don't have END information. We can get START
information though.

Also, right now it's impossible to write stateful dumpers that do any
kind of stats calculation, because it's impossible to determine when
iteration restarted (it starts from the very beginning, not from the
last element). It's impossible to just rememebr last processed
seq_num, because BPF program might be called for a new "session" in
parallel with the old one.

Theoretically, session end can be detected by checking the return
value of last bpf_seq_printf() or bpf_seq_write(). If it indicates
an overflow, that means session end.

Or bpfdump infrastructure can help do this work to provide
session id.

So it seems like few things would be useful:

1. end flag for post-aggregation and/or footer printing (seq_num == 0
is providing similar means for start flag).

the end flag is a problem. We could say hijack next or stop so we
can detect the end, but passing a NULL pointer as the object
to the bpf program may be problematic without verifier enforcement
as it may cause a lot of exceptions... Although all these exception
will be silenced by bpf infra, but still not sure whether this
is acceptable or not.

2. Some sort of "session id", so that bpfdumper can maintain
per-session intermediate state. Plus with this it would be possible to
detect restarts (if there is some state for the same session and
seq_num == 0, this is restart).

I guess we can do this.

It seems like it might be a bit more flexible to, instead of providing
seq_file * pointer directly, actually provide a bpfdumper_context
struct, which would have seq_file * as one of fields, other being
session_id and start/stop flags.

As you mentioned, if we have more fields related to seq_file passing
to bpf program, yes, grouping them into a structure makes sense.

A bit unstructured thoughts, but what do you think?

Note the seq_num does not represent the num
of unique kernel objects the bpf program has
seen. But it should be a good approximate.

A target feature BPF_DUMP_SEQ_NET_PRIVATE
is implemented specifically useful for
net based dumpers. It sets net namespace
as the current process net namespace.
This avoids changing existing net seq_ops
in order to retrieve net namespace from
the seq_file pointer.

For open dumper files, anonymous or not, the
fdinfo will show the target and prog_id associated
with that file descriptor. For dumper file itself,
a kernel interface will be provided to retrieve the
prog_id in one of the later patches.

Signed-off-by: Yonghong Song <yhs@xxxxxx>
---
  include/linux/bpf.h            |   5 +
  include/uapi/linux/bpf.h       |   6 +-
  kernel/bpf/dump.c              | 338 ++++++++++++++++++++++++++++++++-
  kernel/bpf/syscall.c           |  11 +-
  tools/include/uapi/linux/bpf.h |   6 +-
  5 files changed, 362 insertions(+), 4 deletions(-)

[...]