Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Tue, 17 Mar 2020 20:30:31 +0100

On 3/16/20 9:33 PM, Song Liu wrote:
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
Typical userspace tools use kernel.bpf_stats_enabled as follows:

   1. Enable kernel.bpf_stats_enabled;
   2. Check program run_time_ns;
   3. Sleep for the monitoring period;
   4. Check program run_time_ns again, calculate the difference;
   5. Disable kernel.bpf_stats_enabled.

The problem with this approach is that only one userspace tool can toggle
this sysctl. If multiple tools toggle the sysctl at the same time, the
measurement may be inaccurate.

To fix this problem while keep backward compatibility, introduce a new
bpf command BPF_ENABLE_RUNTIME_STATS. On success, this command enables
run_time_ns stats and returns a valid fd.

With BPF_ENABLE_RUNTIME_STATS, user space tool would have the following
flow:

   1. Get a fd with BPF_ENABLE_RUNTIME_STATS, and make sure it is valid;
   2. Check program run_time_ns;
   3. Sleep for the monitoring period;
   4. Check program run_time_ns again, calculate the difference;
   5. Close the fd.

Signed-off-by: Song Liu <songliubraving@xxxxxx>

Hmm, I see no relation to /dev/bpf_stats anymore, yet the subject still talks
about it?

Also, should this have bpftool integration now that we have `bpftool prog profile`
support? Would be nice to then fetch the related stats via bpf_prog_info, so users
can consume this in an easy way.

Changes RFC => v2:
1. Add a new bpf command instead of /dev/bpf_stats;
2. Remove the jump_label patch, which is no longer needed;
3. Add a static variable to save previous value of the sysctl.
---
  include/linux/bpf.h            |  1 +
  include/uapi/linux/bpf.h       |  1 +
  kernel/bpf/syscall.c           | 43 ++++++++++++++++++++++++++++++++++
  kernel/sysctl.c                | 36 +++++++++++++++++++++++++++-
  tools/include/uapi/linux/bpf.h |  1 +
  5 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4fd91b7c95ea..d542349771df 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -970,6 +970,7 @@ _out:							\
  
  #ifdef CONFIG_BPF_SYSCALL
  DECLARE_PER_CPU(int, bpf_prog_active);
+extern struct mutex bpf_stats_enabled_mutex;
  
  /*
   * Block execution of BPF programs attached to instrumentation (perf,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 40b2d9476268..8285ff37210c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -111,6 +111,7 @@ enum bpf_cmd {
  	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
  	BPF_MAP_UPDATE_BATCH,
  	BPF_MAP_DELETE_BATCH,
+	BPF_ENABLE_RUNTIME_STATS,
  };
  
  enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b2f73ecacced..823dc9de7953 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -24,6 +24,9 @@
  #include <linux/ctype.h>
  #include <linux/nospec.h>
  #include <linux/audit.h>
+#include <linux/miscdevice.h>

Is this still needed?

+#include <linux/fs.h>
+#include <linux/jump_label.h>
  #include <uapi/linux/btf.h>
  
  #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
@@ -3550,6 +3553,43 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
  	return err;
  }
  

Thanks,
Daniel