Re: [PATCH v4 bpf-next 1/3] bpf: Add bpf_iter_cpumask kfuncs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 24, 2024 at 1:31 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Wed, Jan 24, 2024 at 2:26 AM David Vernet <void@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Jan 23, 2024 at 11:27:14PM +0800, Yafang Shao wrote:
> > > Add three new kfuncs for bpf_iter_cpumask.
> > > - bpf_iter_cpumask_new
> > >   KF_RCU is defined because the cpumask must be a RCU trusted pointer
> > >   such as task->cpus_ptr.
> > > - bpf_iter_cpumask_next
> > > - bpf_iter_cpumask_destroy
> > >
> > > These new kfuncs facilitate the iteration of percpu data, such as
> > > runqueues, psi_cgroup_cpu, and more.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> >
> > Thanks for working on this, this will be nice to have!
> >
> > > ---
> > >  kernel/bpf/cpumask.c | 82 ++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 82 insertions(+)
> > >
> > > diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c
> > > index 2e73533a3811..474072a235d6 100644
> > > --- a/kernel/bpf/cpumask.c
> > > +++ b/kernel/bpf/cpumask.c
> > > @@ -422,6 +422,85 @@ __bpf_kfunc u32 bpf_cpumask_weight(const struct cpumask *cpumask)
> > >       return cpumask_weight(cpumask);
> > >  }
> > >
> > > +struct bpf_iter_cpumask {
> > > +     __u64 __opaque[2];
> > > +} __aligned(8);
> > > +
> > > +struct bpf_iter_cpumask_kern {
> > > +     struct cpumask *mask;
> > > +     int cpu;
> > > +} __aligned(8);
> >
> > Why do we need both of these if we're not going to put the opaque
> > iterator in UAPI?
>
> Good point! Will remove it.
> It aligns with the pattern seen in
> bpf_iter_{css,task,task_vma,task_css}_kern, suggesting that we should
> indeed eliminate them.
>

It feels a bit cleaner to have API-oriented (despite being unstable
and coming from vmlinux.h) iter struct like bpf_iter_cpumask with just
"__opaque" field. And then having _kern variant with actual memory
layout. Technically _kern struct could grow smaller.

I certainly wanted this split for bpf_iter_num as that one is more of
a general purpose and stable struct. It's less relevant for more
unstable iters here.

> >
> > > +
> > > +/**
> > > + * bpf_iter_cpumask_new() - Create a new bpf_iter_cpumask for a specified cpumask
> > > + * @it: The new bpf_iter_cpumask to be created.
> > > + * @mask: The cpumask to be iterated over.
> > > + *
> > > + * This function initializes a new bpf_iter_cpumask structure for iterating over
> > > + * the specified CPU mask. It assigns the provided cpumask to the newly created
> > > + * bpf_iter_cpumask @it for subsequent iteration operations.
> > > + *
> > > + * On success, 0 is returen. On failure, ERR is returned.
> > > + */
> > > +__bpf_kfunc int bpf_iter_cpumask_new(struct bpf_iter_cpumask *it, const struct cpumask *mask)
> > > +{
> > > +     struct bpf_iter_cpumask_kern *kit = (void *)it;
> > > +
> > > +     BUILD_BUG_ON(sizeof(struct bpf_iter_cpumask_kern) > sizeof(struct bpf_iter_cpumask));
> > > +     BUILD_BUG_ON(__alignof__(struct bpf_iter_cpumask_kern) !=
> > > +                  __alignof__(struct bpf_iter_cpumask));
> >
> > Why are we checking > in the first expression instead of just plain
> > equality?
>
> Similar to the previous case, it aligns with others. Once we eliminate
> the struct bpf_iter_cpumask_kern, we can safely discard these
> BUILD_BUG_ON() statements as well.
>
> >
> > > +
> > > +     kit->mask = bpf_mem_alloc(&bpf_global_ma, sizeof(struct cpumask));
> >
> > Probably better to use cpumask_size() here.
>
> will use it.
>
> >
> > > +     if (!kit->mask)
> > > +             return -ENOMEM;
> > > +
> > > +     cpumask_copy(kit->mask, mask);
> > > +     kit->cpu = -1;
> > > +     return 0;
> > > +}
> > > +
> > > +/**
> > > + * bpf_iter_cpumask_next() - Get the next CPU in a bpf_iter_cpumask
> > > + * @it: The bpf_iter_cpumask
> > > + *
> > > + * This function retrieves a pointer to the number of the next CPU within the
> > > + * specified bpf_iter_cpumask. It allows sequential access to CPUs within the
> > > + * cpumask. If there are no further CPUs available, it returns NULL.
> > > + *
> > > + * Returns a pointer to the number of the next CPU in the cpumask or NULL if no
> > > + * further CPUs.
> > > + */
> > > +__bpf_kfunc int *bpf_iter_cpumask_next(struct bpf_iter_cpumask *it)
> > > +{
> > > +     struct bpf_iter_cpumask_kern *kit = (void *)it;
> > > +     const struct cpumask *mask = kit->mask;
> > > +     int cpu;
> > > +
> > > +     if (!mask)
> > > +             return NULL;
> > > +     cpu = cpumask_next(kit->cpu, mask);
> > > +     if (cpu >= nr_cpu_ids)
> > > +             return NULL;
> > > +
> > > +     kit->cpu = cpu;
> > > +     return &kit->cpu;
> > > +}
> > > +
> > > +/**
> > > + * bpf_iter_cpumask_destroy() - Destroy a bpf_iter_cpumask
> > > + * @it: The bpf_iter_cpumask to be destroyed.
> > > + *
> > > + * Destroy the resource assiciated with the bpf_iter_cpumask.
> > > + */
> > > +__bpf_kfunc void bpf_iter_cpumask_destroy(struct bpf_iter_cpumask *it)
> > > +{
> > > +     struct bpf_iter_cpumask_kern *kit = (void *)it;
> > > +
> > > +     if (!kit->mask)
> > > +             return;
> > > +     bpf_mem_free(&bpf_global_ma, kit->mask);
> > > +}
> > > +
> > >  __bpf_kfunc_end_defs();
> > >
> > >  BTF_SET8_START(cpumask_kfunc_btf_ids)
> > > @@ -450,6 +529,9 @@ BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU)
> > >  BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU)
> > >  BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU)
> > >  BTF_ID_FLAGS(func, bpf_cpumask_weight, KF_RCU)
> > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_new, KF_ITER_NEW | KF_RCU)
> > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_next, KF_ITER_NEXT | KF_RET_NULL)
> > > +BTF_ID_FLAGS(func, bpf_iter_cpumask_destroy, KF_ITER_DESTROY)
> > >  BTF_SET8_END(cpumask_kfunc_btf_ids)
> > >
> > >  static const struct btf_kfunc_id_set cpumask_kfunc_set = {
> > > --
> > > 2.39.1
> > >
> > >
>
> --
> Regards
> Yafang





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux