On 03/08/2023 16:22, Yonghong Song wrote: > > > On 8/3/23 1:21 AM, Alan Maguire wrote: >> On 02/08/2023 17:33, Alexei Starovoitov wrote: >>> On Tue, Aug 1, 2023 at 8:30 PM David Vernet <void@xxxxxxxxxxxxx> wrote: >>>> I agree that this is the correct way to generalize this. The only thing >>>> that we'll have to figure out is how to generalize treating const >>>> struct >>>> cpumask * objects as kptrs. In sched_ext [0] we export >>>> scx_bpf_get_idle_cpumask() and scx_bpf_get_idle_smtmask() kfuncs to >>>> return trusted global cpumask kptrs that can then be "released" in >>>> scx_bpf_put_idle_cpumask(). scx_bpf_put_idle_cpumask() is empty and >>>> exists only to appease the verifier that the trusted cpumask kptrs >>>> aren't being leaked and are having their references "dropped". >>> >>> why is it KF_ACQUIRE ? >>> I think it can just return a trusted pointer without acquire. >>> >>>> [0]: >>>> https://lore.kernel.org/all/20230711011412.100319-13-tj@xxxxxxxxxx/ >>>> >>>> I'd imagine that we have 2 ways forward if we want to enable progs to >>>> fetch other global cpumasks with static lifetimes (e.g. >>>> __cpu_possible_mask or nohz.idle_cpus_mask): >>>> >>>> 1. The most straightforward thing to do would be to add a new kfunc in >>>> kernel/bpf/cpumask.c that's a drop-in replacment for >>>> scx_bpf_put_idle_cpumask(): >>>> >>>> void bpf_global_cpumask_drop(const struct cpumask *cpumask) >>>> {} >>>> >>>> 2. Another would be to implement something resembling what Yonghong >>>> suggested in [1], where progs can link against global allocated >>>> kptrs >>>> like: >>>> >>>> const struct cpumask *__cpu_possible_mask __ksym; >>>> >>>> [1]: >>>> https://lore.kernel.org/all/3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@xxxxxxxxx/#t >>>> >>>> In my opinion (1) is more straightforward, (2) is a better UX. >>> >>> 1 = adding few kfuncs. >>> 2 = teaching pahole to emit certain global vars. >>> >>> nm vmlinux|g -w D|g -v __SCK_|g -v __tracepoint_|wc -l >>> 1998 >>> >>> imo BTF increase trade off is acceptable. >> >> Agreed, Stephen's numbers on BTF size increase were pretty modest [1]. >> >> What was gating that work in my mind was previous discussion around >> splitting aspects of BTF into a "vmlinux-extra". Experiments with this >> seemed to show it's hard to support, and worse, tooling would have to >> learn about its existence. We have to come up with a CONFIG convention >> about specifying what ends up in -extra versus core vmlinux BTF, what do >> we do about modules, etc. All feels like over-complication. >> >> I think a better path would be to support BTF in a vmlinux BTF module >> (controlled by making CONFIG_DEBUG_INFO_BTF tristate). The module is >> separately loadable, but puts vmlinux in the same place for tools - >> /sys/kernel/btf/vmlinux. That solves already-existing issues of BTF size >> for embedded use cases that have come up a few times, and lessens >> concerns about BTF size for other users, while it all works with >> existing tooling. I have a basic proof-of-concept but it will take time >> to hammer into shape. >> >> Because variable-related size increases are pretty modest, so should we >> proceed with the BTF variable support anyway? We can modularize BTF >> separately later on for those concerned about BTF size. > > Alan, it seems a consensus has reached that we should include > global variables (excluding special kernel made ones like > __SCK_ and __tracepoint_) in vmlinux BTF. > please go ahead and propose a patch. Thanks! > Sounds good! Stephen and I will hopefully have something ready soon; the changes are mostly in dwarves plus a kernel selftest. And good idea on the filtering; this will eliminate a lot of unneeded variables and cut down on size overheads. Thanks! Alan >> >> [1] >> https://lore.kernel.org/bpf/20221104231103.752040-1-stephen.s.brennan@xxxxxxxxxx/