Re: [bpf-next v2 1/2] bpf: add runtime stats, max cost

Song Liu <song@xxxxxxxxxx> · Mon, 19 Dec 2022 00:04:24 -0800

On Sat, Dec 17, 2022 at 7:38 AM Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx> wrote:
>
> On Sat, Dec 17, 2022 at 12:07 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
> >
> > On 12/16/22 10:05 AM, Tonghao Zhang wrote:
> > > On Fri, Dec 16, 2022 at 1:40 PM Yonghong Song <yhs@xxxxxxxx> wrote:
> > >> On 12/14/22 8:32 PM, xiangxia.m.yue@xxxxxxxxx wrote:
> > >>> From: Tonghao Zhang <xiangxia.m.yue@xxxxxxxxx>
> > >>>
> > >>> Now user can enable sysctl kernel.bpf_stats_enabled to fetch
> > >>> run_time_ns and run_cnt. It's easy to calculate the average value.
> > >>>
> > >>> In some case, the max cost for bpf prog invoked, are more useful:
> > >>> is there a burst sysload or high cpu usage. This patch introduce
> > >>> a update stats helper.
> > >>
> > >> I am not 100% sure about how this single max value will be useful
> > >> in general. A particular max_run_time_ns, if much bigger than average,
> > >> could be an outlier due to preemption/softirq etc.
> > >> What you really need might be a trend over time of the run_time
> > >> to capture the burst. You could do this by taking snapshot of
> > > Hi
> > > If the bpf prog is invoked frequently,  the run_time_ns/run_cnt may
> > > not be increased too much while
> > > there is a maxcost in bpf prog. The max cost value means there is at
> > > least one high cost in bpf prog.
> > > we should take care of the most cost of bpf prog. especially, much
> > > more than run_time_ns/run_cnt.
> >
> > But then again, see Yonghong's comment with regards to outliers. I
> > think what you're probably rather asking for is something like tracking
> > p50/p90/p99 run_time_ns numbers over time to get a better picture. Not
> > sure how single max cost would help, really..
> What I am asking for is that is there a high cpu cost in bpf prog ? If
> the bpf prog run frequently,
> the run_time_ns/cnt is not what we want. because if we get bpf runtime
> stats frequently, there will
> be a high syscall cpu load. so we can't use syscall frequently. so why
> I need this max cost value, as
> yonghong say "if much bigger than average, could be an outlier due to
> preemption/softirq etc.". It is right.
> but I think there is another reason, the bpf prog may be too bad to
> cause the issue or bpf prog invoke a bpf helper which
> take a lot cpu. Anyway this can help us debug the bpf prog. and help
> us to know what max cost the prog take. If possible
> we can update the commit message and send v3.

kernel.bpf_stats_enabled is a relatively light weight monitoring interface.
One of the use cases is to enable it for a few seconds periodically, so
we can get an overview of all BPF programs in the system.

While max time cost might be useful in some debugging, I don't think
we should add it with kernel.bpf_stats_enabled. Otherwise, we can
argue p50/p90/p99 are also useful in some cases, and some other
metrics are useful in some other cases.  These metrics together will
make kernel.bpf_stats_enabled too expensive for the use case above.

Since the use case is for debugging, have you considered using
some other BPF programs to profile the target BPF program?
Please refer to "bpftool prog profile" or "perf stat -b " for
examples of similar solutions. We may need to revise the following
check in bpf_check_attach_target() to make this work for some
scenarios:

                if (tgt_prog->type == prog->type) {
                        /* Cannot fentry/fexit another fentry/fexit program.
                         * Cannot attach program extension to another extension.
                         * It's ok to attach fentry/fexit to extension program.
                         */
                        bpf_log(log, "Cannot recursively attach\n");
                        return -EINVAL;
                }

Thanks,
Song