Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu

Yafang Shao <laoar.shao@xxxxxxxxx> · Wed, 2 Aug 2023 10:33:58 +0800

On Wed, Aug 2, 2023 at 1:53 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
>
> On 8/1/23 7:29 AM, Yafang Shao wrote:
> > Some statistic data is stored in percpu pointer but the kernel doesn't
> > aggregate it into a single value, for example, the data in struct
> > psi_group_cpu.
> >
> > Currently, we can traverse percpu data using for_loop and bpf_per_cpu_ptr:
> >
> >    for_loop(nr_cpus, callback_fn, callback_ctx, 0)
> >
> > In the callback_fn, we retrieve the percpu pointer with bpf_per_cpu_ptr().
> > The drawback is that 'nr_cpus' cannot be a variable; otherwise, it will be
> > rejected by the verifier, hindering deployment, as servers may have
> > different 'nr_cpus'. Using CONFIG_NR_CPUS is not ideal.
> >
> > Alternatively, with the bpf_cpumask family, we can obtain a task's cpumask.
> > However, it requires creating a bpf_cpumask, copying the cpumask from the
> > task, and then parsing the CPU IDs from it, resulting in low efficiency.
> > Introducing other kfuncs like bpf_cpumask_next might be necessary.
> >
> > A new bpf helper, bpf_for_each_cpu, is introduced to conveniently traverse
> > percpu data, covering all scenarios. It includes
> > for_each_{possible, present, online}_cpu. The user can also traverse CPUs
> > from a specific task, such as walking the CPUs of a cpuset cgroup when the
> > task is in that cgroup.
>
> The bpf subsystem has adopted kfunc approach. So there is no bpf helper
> any more.

Could you pls. share some background why we made this decision ?

> You need to have a bpf_for_each_cpu kfunc instead.
> But I am wondering whether we should use open coded iterator loops
>     06accc8779c1  bpf: add support for open-coded iterator loops

The usage of open-coded iterator for-loop is almost the same with
bpf_loop, except that it can start from a non-zero value.

>
> In kernel, we have a global variable
>     nr_cpu_ids (also in kernel/bpf/helpers.c)
> which is used in numerous places for per cpu data struct access.
>
> I am wondering whether we could have bpf code like
>     int nr_cpu_ids __ksym;
>
>     struct bpf_iter_num it;
>     int i = 0;
>
>     // nr_cpu_ids is special, we can give it a range [1, CONFIG_NR_CPUS].
>     bpf_iter_num_new(&it, 1, nr_cpu_ids);
>     while ((v = bpf_iter_num_next(&it))) {
>            /* access cpu i data */
>            i++;
>     }
>     bpf_iter_num_destroy(&it);
>
>  From all existing open coded iterator loops, looks like
> upper bound has to be a constant. We might need to extend support
> to bounded scalar upper bound if not there.

Currently the upper bound is required by both the open-coded for-loop
and the bpf_loop. I think we can extend it.

It can't handle the cpumask case either.

    for_each_cpu(cpu, mask)

In the 'mask', the CPU IDs might not be continuous. In our container
environment, we always use the cpuset cgroup for some critical tasks,
but it is not so convenient to traverse the percpu data of this cpuset
cgroup.  We have to do it as follows for this case :

That's why we prefer to introduce a bpf_for_each_cpu helper. It is
fine if it can be implemented as a kfunc.

-- 
Regards
Yafang