Re: [RFC PATCH bpf-next 0/3] bpf: Add new bpf helper bpf_for_each_cpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/3/23 1:21 AM, Alan Maguire wrote:
On 02/08/2023 17:33, Alexei Starovoitov wrote:
On Tue, Aug 1, 2023 at 8:30 PM David Vernet <void@xxxxxxxxxxxxx> wrote:
I agree that this is the correct way to generalize this. The only thing
that we'll have to figure out is how to generalize treating const struct
cpumask * objects as kptrs. In sched_ext [0] we export
scx_bpf_get_idle_cpumask() and scx_bpf_get_idle_smtmask() kfuncs to
return trusted global cpumask kptrs that can then be "released" in
scx_bpf_put_idle_cpumask(). scx_bpf_put_idle_cpumask() is empty and
exists only to appease the verifier that the trusted cpumask kptrs
aren't being leaked and are having their references "dropped".

why is it KF_ACQUIRE ?
I think it can just return a trusted pointer without acquire.

[0]: https://lore.kernel.org/all/20230711011412.100319-13-tj@xxxxxxxxxx/

I'd imagine that we have 2 ways forward if we want to enable progs to
fetch other global cpumasks with static lifetimes (e.g.
__cpu_possible_mask or nohz.idle_cpus_mask):

1. The most straightforward thing to do would be to add a new kfunc in
    kernel/bpf/cpumask.c that's a drop-in replacment for
    scx_bpf_put_idle_cpumask():

void bpf_global_cpumask_drop(const struct cpumask *cpumask)
{}

2. Another would be to implement something resembling what Yonghong
    suggested in [1], where progs can link against global allocated kptrs
    like:

const struct cpumask *__cpu_possible_mask __ksym;

[1]: https://lore.kernel.org/all/3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@xxxxxxxxx/#t

In my opinion (1) is more straightforward, (2) is a better UX.

1 = adding few kfuncs.
2 = teaching pahole to emit certain global vars.

nm vmlinux|g -w D|g -v __SCK_|g -v __tracepoint_|wc -l
1998

imo BTF increase trade off is acceptable.

Agreed, Stephen's numbers on BTF size increase were pretty modest [1].

What was gating that work in my mind was previous discussion around
splitting aspects of BTF into a "vmlinux-extra". Experiments with this
seemed to show it's hard to support, and worse, tooling would have to
learn about its existence. We have to come up with a CONFIG convention
about specifying what ends up in -extra versus core vmlinux BTF, what do
we do about modules, etc. All feels like over-complication.

I think a better path would be to support BTF in a vmlinux BTF module
(controlled by making CONFIG_DEBUG_INFO_BTF tristate). The module is
separately loadable, but puts vmlinux in the same place for tools -
/sys/kernel/btf/vmlinux. That solves already-existing issues of BTF size
for embedded use cases that have come up a few times, and lessens
concerns about BTF size for other users, while it all works with
existing tooling. I have a basic proof-of-concept but it will take time
to hammer into shape.

Because variable-related size increases are pretty modest, so should we
proceed with the BTF variable support anyway? We can modularize BTF
separately later on for those concerned about BTF size.

Alan, it seems a consensus has reached that we should include
global variables (excluding special kernel made ones like
__SCK_ and __tracepoint_) in vmlinux BTF.
please go ahead and propose a patch. Thanks!


[1]
https://lore.kernel.org/bpf/20221104231103.752040-1-stephen.s.brennan@xxxxxxxxxx/




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux