Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats

Song Liu <songliubraving@xxxxxx> · Tue, 17 Mar 2020 19:54:43 +0000

> On Mar 17, 2020, at 12:30 PM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
> 
> On 3/16/20 9:33 PM, Song Liu wrote:
>> Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
>> Typical userspace tools use kernel.bpf_stats_enabled as follows:
>>   1. Enable kernel.bpf_stats_enabled;
>>   2. Check program run_time_ns;
>>   3. Sleep for the monitoring period;
>>   4. Check program run_time_ns again, calculate the difference;
>>   5. Disable kernel.bpf_stats_enabled.
>> The problem with this approach is that only one userspace tool can toggle
>> this sysctl. If multiple tools toggle the sysctl at the same time, the
>> measurement may be inaccurate.
>> To fix this problem while keep backward compatibility, introduce a new
>> bpf command BPF_ENABLE_RUNTIME_STATS. On success, this command enables
>> run_time_ns stats and returns a valid fd.
>> With BPF_ENABLE_RUNTIME_STATS, user space tool would have the following
>> flow:
>>   1. Get a fd with BPF_ENABLE_RUNTIME_STATS, and make sure it is valid;
>>   2. Check program run_time_ns;
>>   3. Sleep for the monitoring period;
>>   4. Check program run_time_ns again, calculate the difference;
>>   5. Close the fd.
>> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> 
> Hmm, I see no relation to /dev/bpf_stats anymore, yet the subject still talks
> about it?

My fault. Will fix..

> 
> Also, should this have bpftool integration now that we have `bpftool prog profile`
> support? Would be nice to then fetch the related stats via bpf_prog_info, so users
> can consume this in an easy way.

We can add "run_time_ns" as a metric to "bpftool prog profile". But the 
mechanism is not the same though. Let me think about this. 

> 
>> Changes RFC => v2:
>> 1. Add a new bpf command instead of /dev/bpf_stats;
>> 2. Remove the jump_label patch, which is no longer needed;
>> 3. Add a static variable to save previous value of the sysctl.
>> ---
>>  include/linux/bpf.h            |  1 +
>>  include/uapi/linux/bpf.h       |  1 +
>>  kernel/bpf/syscall.c           | 43 ++++++++++++++++++++++++++++++++++
>>  kernel/sysctl.c                | 36 +++++++++++++++++++++++++++-
>>  tools/include/uapi/linux/bpf.h |  1 +
>>  5 files changed, 81 insertions(+), 1 deletion(-)
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 4fd91b7c95ea..d542349771df 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -970,6 +970,7 @@ _out:							\
>>    #ifdef CONFIG_BPF_SYSCALL
>>  DECLARE_PER_CPU(int, bpf_prog_active);
>> +extern struct mutex bpf_stats_enabled_mutex;
>>    /*
>>   * Block execution of BPF programs attached to instrumentation (perf,
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 40b2d9476268..8285ff37210c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -111,6 +111,7 @@ enum bpf_cmd {
>>  	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
>>  	BPF_MAP_UPDATE_BATCH,
>>  	BPF_MAP_DELETE_BATCH,
>> +	BPF_ENABLE_RUNTIME_STATS,
>>  };
>>    enum bpf_map_type {
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index b2f73ecacced..823dc9de7953 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -24,6 +24,9 @@
>>  #include <linux/ctype.h>
>>  #include <linux/nospec.h>
>>  #include <linux/audit.h>
>> +#include <linux/miscdevice.h>
> 
> Is this still needed?

My fault again. Will fix. 

Thanks,
Song