> On Mar 13, 2020, at 7:43 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Mar 13, 2020 at 05:35:16PM -0700, Song Liu wrote: >> Motivation (copied from 2/2): >> >> ======================= 8< ======================= >> Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats. >> Typical userspace tools use kernel.bpf_stats_enabled as follows: >> >> 1. Enable kernel.bpf_stats_enabled; >> 2. Check program run_time_ns; >> 3. Sleep for the monitoring period; >> 4. Check program run_time_ns again, calculate the difference; >> 5. Disable kernel.bpf_stats_enabled. >> >> The problem with this approach is that only one userspace tool can toggle >> this sysctl. If multiple tools toggle the sysctl at the same time, the >> measurement may be inaccurate. >> >> To fix this problem while keep backward compatibility, introduce >> /dev/bpf_stats. sysctl kernel.bpf_stats_enabled will only change the >> lowest bit of the static key. /dev/bpf_stats, on the other hand, adds 2 >> to the static key for each open fd. The runtime stats is enabled when >> kernel.bpf_stats_enabled == 1 or there is open fd to /dev/bpf_stats. >> >> With /dev/bpf_stats, user space tool would have the following flow: >> >> 1. Open a fd to /dev/bpf_stats; >> 2. Check program run_time_ns; >> 3. Sleep for the monitoring period; >> 4. Check program run_time_ns again, calculate the difference; >> 5. Close the fd. >> ======================= 8< ======================= >> >> 1/2 adds a few new API to jump_label. >> 2/2 adds the /dev/bpf_stats and adjust kernel.bpf_stats_enabled handler. >> >> Please share your comments. > > Conceptually makes sense to me. Few comments: > 1. I don't understand why +2 logic is necessary. > Just do +1 for every FD and change proc_do_static_key() from doing > explicit enable/disable to do +1/-1 as well on transition from 0->1 and 1->0. > The handler would need to check that 1->1 and 0->0 is a nop. With the +2/-2 logic, we use the lowest bit of the counter to remember the value of the sysctl. Otherwise, we cannot tell whether we are making 0->1 transition or 1->1 transition. > > 2. /dev is kinda awkward. May be introduce a new bpf command that returns fd? Yeah, I also feel /dev is awkward. fd from bpf command sounds great. > > 3. Instead of 1 and 2 tweak sysctl to do ++/-- unconditionally? > Like repeated sysctl kernel.bpf_stats_enabled=1 will keep incrementing it > and would need equal amount of sysctl kernel.bpf_stats_enabled=0 to get > it back to zero where it will stay zero even if users keep spamming > sysctl kernel.bpf_stats_enabled=0. > This way current services that use sysctl will keep working as-is. > Multiple services that currently collide on sysctl will magically start > working without any changes to them. It is still backwards compatible. I think this is not fully backwards compatible. With current logic, the following sequence disables stats eventually. sysctl kernel.bpf_stats_enabled=1 sysctl kernel.bpf_stats_enabled=1 sysctl kernel.bpf_stats_enabled=0 The same sequence will not disable stats with the ++/-- sysctl. Thanks, Song