On Wed, Aug 02, 2023 at 09:33:18AM -0700, Alexei Starovoitov wrote: > On Tue, Aug 1, 2023 at 8:30 PM David Vernet <void@xxxxxxxxxxxxx> wrote: > > I agree that this is the correct way to generalize this. The only thing > > that we'll have to figure out is how to generalize treating const struct > > cpumask * objects as kptrs. In sched_ext [0] we export > > scx_bpf_get_idle_cpumask() and scx_bpf_get_idle_smtmask() kfuncs to > > return trusted global cpumask kptrs that can then be "released" in > > scx_bpf_put_idle_cpumask(). scx_bpf_put_idle_cpumask() is empty and > > exists only to appease the verifier that the trusted cpumask kptrs > > aren't being leaked and are having their references "dropped". > > why is it KF_ACQUIRE ? > I think it can just return a trusted pointer without acquire. I don't think there's a way to do this yet without hard-coding the kfuncs as special, right? That's why we do stuff like this: 11479 } else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) { 11480 mark_reg_known_zero(env, regs, BPF_REG_0); 11481 regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED; 11482 regs[BPF_REG_0].btf = desc_btf; 11483 regs[BPF_REG_0].btf_id = meta.ret_btf_id; We could continue to do that, but I wonder if it would be useful to add a kfunc flag that allowed a kfunc to specify that? Something like KF_ALWAYS_TRUSTED? In general though, yes, we can teach the verifier to not require KF_ACQUIRE if we want. It's just that what we have now doesn't really scale to the kernel for any global cpumask. > > [0]: https://lore.kernel.org/all/20230711011412.100319-13-tj@xxxxxxxxxx/ > > > > I'd imagine that we have 2 ways forward if we want to enable progs to > > fetch other global cpumasks with static lifetimes (e.g. > > __cpu_possible_mask or nohz.idle_cpus_mask): > > > > 1. The most straightforward thing to do would be to add a new kfunc in > > kernel/bpf/cpumask.c that's a drop-in replacment for > > scx_bpf_put_idle_cpumask(): > > > > void bpf_global_cpumask_drop(const struct cpumask *cpumask) > > {} > > > > 2. Another would be to implement something resembling what Yonghong > > suggested in [1], where progs can link against global allocated kptrs > > like: > > > > const struct cpumask *__cpu_possible_mask __ksym; > > > > [1]: https://lore.kernel.org/all/3f56b3b3-9b71-f0d3-ace1-406a8eeb64c0@xxxxxxxxx/#t > > > > In my opinion (1) is more straightforward, (2) is a better UX. > > 1 = adding few kfuncs. > 2 = teaching pahole to emit certain global vars. > > nm vmlinux|g -w D|g -v __SCK_|g -v __tracepoint_|wc -l > 1998 > > imo BTF increase trade off is acceptable. Agreed