On Fri, May 13, 2022 at 03:12:06PM +0200, Daniel Borkmann wrote: > > Thinking more about it, is there even any value for BPF_FUNC_dynptr_* for > fully unpriv BPF if these are rejected anyway by the spectre mitigations > from verifier? ... > So either for alloc, we always built-in __GFP_ZERO or bpf_dynptr_alloc() > helper usage should go under perfmon_capable() where it's allowed to read > kernel mem. dynptr should probably by cap_bpf and cap_perfmon for now. Otherwise we will start adding cap_perfmon checks in run-time to helpers which is not easy to do. Some sort of prog or user context would need to be passed as hidden arg into helper. That's too much hassle just to enable dynptr for cap_bpf only. Similar problem with gfp_account... remembering memcg and passing all the way to bpf_dynptr_alloc helper is not easy. And it's not clear which memcg to use. The one where task was that loaded that bpf prog? That task could have been gone and cgroup is in dying stage. bpf prog is executing some context and allocating memory for itself. Like kernel allocates memory for its needs. It doesn't feel right to charge prog's memcg in that case. It probably should be an explicit choice by bpf program author. Maybe in the future we can introduce a fake map for such accounting needs and bpf prog could pass a map pointer to bpf_dynptr_alloc. When such fake and empty map is created the memcg would be recorded the same way we do for existing normal maps. Then the helper will look like: bpf_dynptr_alloc(struct bpf_map *map, u32 size, u64 flags, struct bpf_dynptr *ptr) { set_active_memcg(map->memcg); kmalloc into dynptr; } Should we do this change now and allow NULL to be passed as a map ? This way the bpf prog will have a choice whether to account into memcg or not. Maybe it's all overkill and none of this needed? On the other side maybe map should be a mandatory argument and dynptr_alloc can do its own memory accounting for stats ? atomic inc and dec is probably an acceptable overhead? bpftool will print the dynptr allocation stats. All sounds nice and extra visibility is great, but the kernel code that allocates for the kernel doesn't use memcg. bpf progs semantically are part of the kernel whereas memcg is a mechanism to restrict memory that kernel allocated on behalf of user tasks. We abused memcg for bpf progs/maps to have a limit. Not clear whether we should continue doing so for dynpr_alloc and in the future for kptr_alloc. gfp_account adds overhead too. It's not free. Thoughts?