Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Wed, 18 Mar 2020 21:58:07 +0100

On 3/18/20 7:33 AM, Song Liu wrote:
On Mar 17, 2020, at 4:08 PM, Song Liu <songliubraving@xxxxxx> wrote:
On Mar 17, 2020, at 2:47 PM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:

Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit
programs supersede this old bpf_stats infrastructure? Iow, can't we implement the
same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then
potentially deprecate bpf_stats counters?
I think run_time_ns has its own value as a simple monitoring framework. We can
use it in tools like top (and variations). It will be easier for these tools to
adopt run_time_ns than using fentry/fexit.

Agree that this is easier; I presume there is no such official integration today
in tools like top, right, or is there anything planned?

Yes, we do want more supports in different tools to increase the visibility.
Here is the effort for atop: https://github.com/Atoptool/atop/pull/88 .

I wasn't pushing push hard on this one mostly because the sysctl interface requires
a user space "owner".

On the other hand, in long term, we may include a few fentry/fexit based programs
in the kernel binary (or the rpm), so that these tools can use them easily. At
that time, we can fully deprecate run_time_ns. Maybe this is not too far away?

Did you check how feasible it is to have something like `bpftool prog profile top`
which then enables fentry/fexit for /all/ existing BPF programs in the system? It
could then sort the sample interval by run_cnt, cycles, cache misses, aggregated
runtime, etc in a top-like output. Wdyt?

I wonder whether we can achieve this with one bpf prog (or a trampoline) that covers
all BPF programs, like a trampoline inside __BPF_PROG_RUN()?

For long term direction, I think we could compare two different approaches: add new
tools (like bpftool prog profile top) vs. add BPF support to existing tools. The
first approach is easier. The latter approach would show BPF information to users
who are not expecting BPF programs in the systems. For many sysadmins, seeing BPF
programs in top/ps, and controlling them via kill is more natural than learning
bpftool. What's your thought on this?

More thoughts on this.

If we have a special trampoline that attach to all BPF programs at once, we really
don't need the run_time_ns stats anymore. Eventually, tools that monitor BPF
programs will depend on libbpf, so using fentry/fexit to monitor BPF programs doesn't
introduce extra dependency. I guess we also need a way to include BPF program in
libbpf.

To summarize this plan, we need:

1) A global trampoline that attaches to all BPF programs at once;

Overall sounds good, I think the `at once` part might be tricky, at least it would
need to patch one prog after another, each prog also needs to store its own metrics
somewhere for later collection. The start-to-sample could be a shared global var (aka
shared map between all the programs) which would flip the switch though.

2) Embed fentry/fexit program in libbpf, which will be used by tools for monitoring;
3) BPF helpers to read time, which replaces current run_time_ns.

Does this look reasonable?

Thanks,
Song