On 5/13/22 6:39 PM, Alexei Starovoitov wrote:
On Fri, May 13, 2022 at 03:12:06PM +0200, Daniel Borkmann wrote:
Thinking more about it, is there even any value for BPF_FUNC_dynptr_* for
fully unpriv BPF if these are rejected anyway by the spectre mitigations
from verifier?
...
So either for alloc, we always built-in __GFP_ZERO or bpf_dynptr_alloc()
helper usage should go under perfmon_capable() where it's allowed to read
kernel mem.
dynptr should probably by cap_bpf and cap_perfmon for now.
Otherwise we will start adding cap_perfmon checks in run-time to helpers
which is not easy to do. Some sort of prog or user context would need
to be passed as hidden arg into helper. That's too much hassle just
to enable dynptr for cap_bpf only.
Similar problem with gfp_account... remembering memcg and passing all
the way to bpf_dynptr_alloc helper is not easy. And it's not clear
which memcg to use. The one where task was that loaded that bpf prog?
That task could have been gone and cgroup is in dying stage.
bpf prog is executing some context and allocating memory for itself.
Like kernel allocates memory for its needs. It doesn't feel right to
charge prog's memcg in that case. It probably should be an explicit choice
by bpf program author. Maybe in the future we can introduce a fake map
for such accounting needs and bpf prog could pass a map pointer to
bpf_dynptr_alloc. When such fake and empty map is created the memcg
would be recorded the same way we do for existing normal maps.
Then the helper will look like:
bpf_dynptr_alloc(struct bpf_map *map, u32 size, u64 flags, struct bpf_dynptr *ptr)
{
set_active_memcg(map->memcg);
kmalloc into dynptr;
}
Should we do this change now and allow NULL to be passed as a map ?
Hm, this looks a bit too much like a hack, I wouldn't do that, fwiw.
This way the bpf prog will have a choice whether to account into memcg or not.
Maybe it's all overkill and none of this needed?
On the other side maybe map should be a mandatory argument and dynptr_alloc
can do its own memory accounting for stats ? atomic inc and dec is probably
an acceptable overhead? bpftool will print the dynptr allocation stats.
All sounds nice and extra visibility is great, but the kernel code that
allocates for the kernel doesn't use memcg. bpf progs semantically are part of
the kernel whereas memcg is a mechanism to restrict memory that kernel
allocated on behalf of user tasks. We abused memcg for bpf progs/maps
to have a limit. Not clear whether we should continue doing so for dynpr_alloc
and in the future for kptr_alloc. gfp_account adds overhead too. It's not free.
Thoughts?
Great question, I think the memcg is useful, just that the ownership for bpf
progs/maps has been relying on current whereas current is not a real 'owner',
just the entity which did the loading.
Maybe we need some sort of memcg object for bpf where we can "bind" the prog
and map to it at load time, which is then different from current and can be
flexibly set, e.g. fd = open(/sys/fs/cgroup/memory/<foo>) and pass that fd to
BPF_PROG_LOAD and BPF_MAP_CREATE via bpf_attr (otherwise, if not set, then
no accounting)?
Thanks,
Daniel