Re: [PATCH bpf-next v4 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 13, 2022 at 3:16 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, May 13, 2022 at 09:28:03PM +0200, Daniel Borkmann wrote:
> > On 5/13/22 6:39 PM, Alexei Starovoitov wrote:
> > > On Fri, May 13, 2022 at 03:12:06PM +0200, Daniel Borkmann wrote:
> > > >
> > > > Thinking more about it, is there even any value for BPF_FUNC_dynptr_* for
> > > > fully unpriv BPF if these are rejected anyway by the spectre mitigations
> > > > from verifier?
> > > ...
> > > > So either for alloc, we always built-in __GFP_ZERO or bpf_dynptr_alloc()
> > > > helper usage should go under perfmon_capable() where it's allowed to read
> > > > kernel mem.
> > >
> > > dynptr should probably by cap_bpf and cap_perfmon for now.
> > > Otherwise we will start adding cap_perfmon checks in run-time to helpers
> > > which is not easy to do. Some sort of prog or user context would need
> > > to be passed as hidden arg into helper. That's too much hassle just
> > > to enable dynptr for cap_bpf only.
> > >
> > > Similar problem with gfp_account... remembering memcg and passing all
> > > the way to bpf_dynptr_alloc helper is not easy. And it's not clear
> > > which memcg to use. The one where task was that loaded that bpf prog?
> > > That task could have been gone and cgroup is in dying stage.
> > > bpf prog is executing some context and allocating memory for itself.
> > > Like kernel allocates memory for its needs. It doesn't feel right to
> > > charge prog's memcg in that case. It probably should be an explicit choice
> > > by bpf program author. Maybe in the future we can introduce a fake map
> > > for such accounting needs and bpf prog could pass a map pointer to
> > > bpf_dynptr_alloc. When such fake and empty map is created the memcg
> > > would be recorded the same way we do for existing normal maps.
> > > Then the helper will look like:
> > > bpf_dynptr_alloc(struct bpf_map *map, u32 size, u64 flags, struct bpf_dynptr *ptr)
> > > {
> > >    set_active_memcg(map->memcg);
> > >    kmalloc into dynptr;
> > > }
> > >
> > > Should we do this change now and allow NULL to be passed as a map ?
> >
> > Hm, this looks a bit too much like a hack, I wouldn't do that, fwiw.
> >
> > > This way the bpf prog will have a choice whether to account into memcg or not.
> > > Maybe it's all overkill and none of this needed?
> > >
> > > On the other side maybe map should be a mandatory argument and dynptr_alloc
> > > can do its own memory accounting for stats ? atomic inc and dec is probably
> > > an acceptable overhead? bpftool will print the dynptr allocation stats.
> > > All sounds nice and extra visibility is great, but the kernel code that
> > > allocates for the kernel doesn't use memcg. bpf progs semantically are part of
> > > the kernel whereas memcg is a mechanism to restrict memory that kernel
> > > allocated on behalf of user tasks. We abused memcg for bpf progs/maps
> > > to have a limit. Not clear whether we should continue doing so for dynpr_alloc
> > > and in the future for kptr_alloc. gfp_account adds overhead too. It's not free.
> > > Thoughts?
> >
> > Great question, I think the memcg is useful, just that the ownership for bpf
> > progs/maps has been relying on current whereas current is not a real 'owner',
> > just the entity which did the loading.
> >
> > Maybe we need some sort of memcg object for bpf where we can "bind" the prog
> > and map to it at load time, which is then different from current and can be
> > flexibly set, e.g. fd = open(/sys/fs/cgroup/memory/<foo>) and pass that fd to
> > BPF_PROG_LOAD and BPF_MAP_CREATE via bpf_attr (otherwise, if not set, then
> > no accounting)?
>
> Agree. Explicitly specifying memcg by FD would be nice.
> It will be useful for normal maps and progs.
> This is a bit orthogonal to having a map argument to bpf_dynptr/kptr_alloc.
>
> Here is the main reason why we probably should have it mandatory:
> kmalloc cannot be called from nmi and in general cannot be called from tracing.
> kprobe/fentry could be inside slab or page alloc path and it might blow up.
> That's the reason why hashmap defaults to pre-alloc.
> In order to do pre-alloc in bpf_dynptr/kptr_alloc() it has to have a map-like
> argument that will keep the info about preallocated memory.
>
> How about the following api:
> mem = bpf_map_create(BPF_MAP_TYPE_MEMORY); // form user space
> bpf_mem_prealloc(mem, size); // preallocate memory. from sleepable or irqwork
> bpf_dynptr_alloc(mem, size, flags, &dynptr); // non-sleepable
> // returns 'size' bytes if they were available in preallocated memory
>
> Right now bpf maps are either full prealloc or full kmalloc.
> This approach will be a hybrid.
> The bpf progs will be using it roughly like this:
>
> // init from user space
> mem = bpf_map_create(BPF_MAP_TYPE_MEMORY);
> sys_bpf(mem_prealloc, 1Mbyte); // prealloc largest possible single dynptr_alloc
>
> // from bpf prog
> bpf_dynptr_alloc(mem, size, flags, &dynptr); // if (size < 1M) all good
> bpf_irq_work_queue(replenish_prealloc, size); // refill mem's prealloc
>
> void replenish_prealloc(sz) { bpf_mem_prealloc(mem, sz); }
>
> bpf_dynptr_alloc would need to implement a memory allocator out of
> reserved memory. We can probably reuse some of sl[oua]b code.
> slob_alloc may fit the best (without dynamic slob_new_pages).
> Song's pack_alloc probably good enough to start.
>
> The gfp_account flag moves into bpf_mem_prealloc() helper.
> It doesn't make sense in bpf_dynptr_alloc.
> While gfp_zero makes sense only in bpf_dynptr_alloc.
>
> Thoughts?

Do you envision this also being used for accounting for kfunc memory
allocations?



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux