> On Mar 18, 2020, at 3:29 PM, Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote: > > On 03/18, Song Liu wrote: >> >> >>> On Mar 18, 2020, at 1:58 PM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: >>> >>> On 3/18/20 7:33 AM, Song Liu wrote: >>>>> On Mar 17, 2020, at 4:08 PM, Song Liu <songliubraving@xxxxxx> wrote: >>>>>> On Mar 17, 2020, at 2:47 PM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit >>>>>>>> programs supersede this old bpf_stats infrastructure? Iow, can't we implement the >>>>>>>> same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then >>>>>>>> potentially deprecate bpf_stats counters? >>>>>>> I think run_time_ns has its own value as a simple monitoring framework. We can >>>>>>> use it in tools like top (and variations). It will be easier for these tools to >>>>>>> adopt run_time_ns than using fentry/fexit. >>>>>> >>>>>> Agree that this is easier; I presume there is no such official integration today >>>>>> in tools like top, right, or is there anything planned? >>>>> >>>>> Yes, we do want more supports in different tools to increase the visibility. >>>>> Here is the effort for atop: https://github.com/Atoptool/atop/pull/88 . >>>>> >>>>> I wasn't pushing push hard on this one mostly because the sysctl interface requires >>>>> a user space "owner". >>>>> >>>>>>> On the other hand, in long term, we may include a few fentry/fexit based programs >>>>>>> in the kernel binary (or the rpm), so that these tools can use them easily. At >>>>>>> that time, we can fully deprecate run_time_ns. Maybe this is not too far away? >>>>>> >>>>>> Did you check how feasible it is to have something like `bpftool prog profile top` >>>>>> which then enables fentry/fexit for /all/ existing BPF programs in the system? It >>>>>> could then sort the sample interval by run_cnt, cycles, cache misses, aggregated >>>>>> runtime, etc in a top-like output. Wdyt? >>>>> >>>>> I wonder whether we can achieve this with one bpf prog (or a trampoline) that covers >>>>> all BPF programs, like a trampoline inside __BPF_PROG_RUN()? >>>>> >>>>> For long term direction, I think we could compare two different approaches: add new >>>>> tools (like bpftool prog profile top) vs. add BPF support to existing tools. The >>>>> first approach is easier. The latter approach would show BPF information to users >>>>> who are not expecting BPF programs in the systems. For many sysadmins, seeing BPF >>>>> programs in top/ps, and controlling them via kill is more natural than learning >>>>> bpftool. What's your thought on this? >>>> More thoughts on this. >>>> If we have a special trampoline that attach to all BPF programs at once, we really >>>> don't need the run_time_ns stats anymore. Eventually, tools that monitor BPF >>>> programs will depend on libbpf, so using fentry/fexit to monitor BPF programs doesn't >>>> introduce extra dependency. I guess we also need a way to include BPF program in >>>> libbpf. >>>> To summarize this plan, we need: >>>> 1) A global trampoline that attaches to all BPF programs at once; >>> >>> Overall sounds good, I think the `at once` part might be tricky, at least it would >>> need to patch one prog after another, each prog also needs to store its own metrics >>> somewhere for later collection. The start-to-sample could be a shared global var (aka >>> shared map between all the programs) which would flip the switch though. >> >> I was thinking about adding bpf_global_trampoline and use it in __BPF_PROG_RUN. >> Something like: >> >> diff --git i/include/linux/filter.h w/include/linux/filter.h >> index 9b5aa5c483cc..ac9497d1fa7b 100644 >> --- i/include/linux/filter.h >> +++ w/include/linux/filter.h >> @@ -559,9 +559,14 @@ struct sk_filter { >> >> DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); >> >> +extern struct bpf_trampoline *bpf_global_trampoline; >> +DECLARE_STATIC_KEY_FALSE(bpf_global_tr_active); >> + >> #define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \ >> u32 ret; \ >> cant_migrate(); \ >> + if (static_branch_unlikely(&bpf_global_tr_active)) \ >> + run_the_trampoline(); \ >> if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ >> struct bpf_prog_stats *stats; \ >> u64 start = sched_clock(); \ >> >> >> I am not 100% sure this is OK. >> >> I am also not sure whether this is an overkill. Do we really want more complex >> metric for all BPF programs? Or run_time_ns is enough? > I was thinking about exporting a real distribution of the prog runtimes > instead of doing an average. It would be interesting to see > 50%/95%/99%/max stats. Good point. Distribution logic fits well in fentry/fexit programs. Let me think more about this. Thanks, Song