Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Tue, 17 Mar 2020 22:47:00 +0100

On 3/17/20 9:13 PM, Song Liu wrote:
On Mar 17, 2020, at 1:03 PM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 3/17/20 8:54 PM, Song Liu wrote:
On Mar 17, 2020, at 12:30 PM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 3/16/20 9:33 PM, Song Liu wrote:
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
Typical userspace tools use kernel.bpf_stats_enabled as follows:
   1. Enable kernel.bpf_stats_enabled;
   2. Check program run_time_ns;
   3. Sleep for the monitoring period;
   4. Check program run_time_ns again, calculate the difference;
   5. Disable kernel.bpf_stats_enabled.
The problem with this approach is that only one userspace tool can toggle
this sysctl. If multiple tools toggle the sysctl at the same time, the
measurement may be inaccurate.
To fix this problem while keep backward compatibility, introduce a new
bpf command BPF_ENABLE_RUNTIME_STATS. On success, this command enables
run_time_ns stats and returns a valid fd.
With BPF_ENABLE_RUNTIME_STATS, user space tool would have the following
flow:
   1. Get a fd with BPF_ENABLE_RUNTIME_STATS, and make sure it is valid;
   2. Check program run_time_ns;
   3. Sleep for the monitoring period;
   4. Check program run_time_ns again, calculate the difference;
   5. Close the fd.
Signed-off-by: Song Liu <songliubraving@xxxxxx>

Hmm, I see no relation to /dev/bpf_stats anymore, yet the subject still talks
about it?
My fault. Will fix..
Also, should this have bpftool integration now that we have `bpftool prog profile`
support? Would be nice to then fetch the related stats via bpf_prog_info, so users
can consume this in an easy way.
We can add "run_time_ns" as a metric to "bpftool prog profile". But the
mechanism is not the same though. Let me think about this.

Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit
programs supersede this old bpf_stats infrastructure? Iow, can't we implement the
same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then
potentially deprecate bpf_stats counters?

I think run_time_ns has its own value as a simple monitoring framework. We can
use it in tools like top (and variations). It will be easier for these tools to
adopt run_time_ns than using fentry/fexit.

Agree that this is easier; I presume there is no such official integration today
in tools like top, right, or is there anything planned?

On the other hand, in long term, we may include a few fentry/fexit based programs
in the kernel binary (or the rpm), so that these tools can use them easily. At
that time, we can fully deprecate run_time_ns. Maybe this is not too far away?

Did you check how feasible it is to have something like `bpftool prog profile top`
which then enables fentry/fexit for /all/ existing BPF programs in the system? It
could then sort the sample interval by run_cnt, cycles, cache misses, aggregated
runtime, etc in a top-like output. Wdyt?

Thanks,
Daniel