Re: [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator, UAPI in particular

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 31 Aug 2022 at 20:57, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Aug 31, 2022 at 05:38:15PM +0000, Delyan Kratunov wrote:
> >
> > Overall, this design (or maybe the way it's presented here) conflates a few ideas.
> >
> > 1) The extensions to expose and customize map's internal element allocator are fine
> > independently of even this patchset.
> >
> > 2) The idea that kptrs in a map need to have a statically identifiable allocator is
> > taken as an axiom, and then expanded to its extreme (single allocator per map as
> > opposed to the smarter verifier schemes). I still contest that this is not the case
> > and the runtime overhead it avoids is paid back in bad developer experience multiple
> > times over.
> >
> > 3) The idea that allocators can be merged between elements and kptrs is independent
> > of the static requirements. If the map's internal allocator is exposed per 1), we can
> > still use it to allocate kptrs but not require that all kptrs in a map are from the
> > same allocator.
> >
> > Going this coarse in the API is easy for us but fundamentally more limiting for
> > developers. It's not hard to imagine situations where the verifier dependency
> > tracking or runtime lifetime tracking would allow for pinned maps to be retained but
> > this scheme would require new maps entirely. (Any situation where you just refactored
> > the implicit allocator out to share it, for example)
> >
> > I also don't think that simplicity for us (a one time implementation cost +
> > continuous maintenance cost) trumps over long term developer experience (a much
> > bigger implementation cost over a much bigger time span).
>
> It feels we're thinking about scope and use cases for the allocator quite
> differently and what you're seeing as 'limiting developer choices' to me looks
> like 'not a limiting issue at all'. To me the allocator is one

I went over the proposal multiple times, just to make sure I
understood it properly, but I still can't see this working ideally for
the inner map case, even if we ignore the rest of the things for a
moment.
But maybe you prefer to just forbid them there? Please correct me if I'm wrong.

You won't be able to know the allocator statically for inner maps (and
hence not be able to enforce the kptr_xchg requirement to be from the
same allocator as map). To know it, we will have to force all
inner_maps to use the same allocator, either the one specified for
inner map fd, or the one in map-in-map definition, or elsewhere. But
to be able to know it statically the information will have to come
from map-in-map somehow.

That seems like a very weird limitation just to use local kptrs, and
doesn't even make sense for map-in-map use cases IMO.
And unless I'm missing something there isn't an easy way to
accommodate it in the 'statically known allocator' proposal, because
such inner_map allocators (and inner_maps) are themselves not static.

> jemalloc/tcmalloc instance. One user space application with multiple threads,
> lots of maps and code is using exactly one such allocator. The allocator
> manages all the memory of user space process. In bpf land we don't have a bpf
> process. We don't have a bpf name space either.  A loose analogy would be a set
> of programs and maps managed by one user space agent. The bpf allocator would
> manage all the memory of these maps and programs and provide a "memory namespace"
> for this set of programs. Another user space agent with its own programs
> would never want to share the same allocator. In user space a chunk of memory
> could be mmap-ed between different process to share the data, but you would never
> put a tcmalloc over such memory to be an allocator for different processes.
>

But just saying "would never" or "should never" doesn't work right?
Hyrum's law and all.

libbpf style "bpf package" deployments may not be the only consumers
of this infra in the future. So designing around this specific idea
that people will never or shouldn't dynamically share their allocator
objects between maps which don't share the same allocator seems
destined to only serve us in the short run IMO.

People may come up with cases where they are passing ownership of
objects between such bpf packages, and after coming up with multiple
examples before it doesn't seem likely static dependencies will be
able to capture such dynamic runtime relationships, e.g. static
dependencies don't even work in the inner_map case without more
restrictions.

> More below.
>
> > So far, my ranked choice vote is:
> >
> > 1) maximum flexibility and runtime live object counts (with exposed allocators, I
> > like the merging)
> > 2) medium flexibility with per-field allocator tracking in the verifier and the
> > ability to lose the association once programs are unloaded and values are gone. This
> > also works better with exposed allocators since they are implicitly pinned and would
> > be usable to store values in another map.
> > 3) minimum flexibility with static whole-map kptr allocators
>
> The option 1 flexibility is necessary when allocator is seen as a private pool
> of objects of given size. Like kernel's kmem_cache instance.
> I don't think we quite there yet.
> There is a need to "preallocate this object from sleepable context,
> so the prog has a guaranteed chunk of memory to use in restricted context",
> but, arguably, it's not a job of bpf allocator. bpf prog can allocate an object,
> stash it into kptr, and use it later.

At least if not adding support for it all now, I think this kind of
flexibility in option 1 needs to be given some more consideration, as
in whether this proposal to encode things statically would be able to
accommodate such cases in the future. To me it seems pretty hard (and
unless I missed something, it already won't work for inner_maps case
without requiring all to use the same allocator).

We might actually be able to do a hybrid of the options by utilizing
the statically known allocator info to acquire references and runtime
object counts, which may help eliminate/delay the actual cost we pay
for it - the atomic upgrade, when initial reference goes away.

So I think I lean towards option 1 as well, and then the same order as
Delyan. It seems to cover all kinds of corner cases (allocator known
vs unknown, normal vs inner maps, etc.) we've been going over in this
thread, and would also be future proof in terms of permitting
unforeseen patterns of usage.

> So option 3 doesn't feel less flexible to me. imo the whole-map-allocator is
> more than we need. Ideally it would be easy to specifiy one single
> allocator for all maps and progs in a set of .c files. Sort-of a bpf package.
> In other words one bpf allocator per bpf "namespace" is more than enough.
> Program authors shouldn't be creating allocators left and right. All these
> free lists will waste memory.
> btw I've added an extra patch to bpf_mem_alloc series:
> https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=memalloc&id=6a586327a270272780bdad7446259bbe62574db1
> that removes kmem_cache usage.
> Turned out (hindsight 20/20) kmem_cache for each bpf map was a bad idea.
> When free_lists are not shared they will similarly waste memory.
> In user space the C code just does malloc() and the memory is isolated per process.
> Ideally in bpf world the programs would just do:
> bpf_mem_alloc(btf_type_id_local(struct foo));
> without specifying an allocator, but that would require one global allocator
> for all bpf programs in the kernel which is probably not a direction we should go ?
> So the programs have to specify an allocator to use in bpf_mem_alloc(),
> but it should be one for all progs, maps in a bpf-package/set/namespace.|

But "all progs" doesn't mean all of them are running in the same
context and having the same kinds of memory allocation requirements,
right? While having too many allocators is also bad, having just one
single one per package would also be limiting. A bpf object/package is
very different from a userspace process. So the analogy doesn't
exactly fit IMO.



> If it's easy for programs to specify a bunch of allocators, like one for each program,
> or one for each btf_type_id the bpf kernel infra would be required to merge
> these allocators from day one. (The profileration of kmem_cache-s in the past
> forced merging of them). By restricting bpf program choices with allocator-per-map
> (this option 3) we're not only making the kernel side to do less work
> (no run-time ref counts, no merging is required today), we're also pushing
> bpf progs to use memory concious choices.
> Having said all that maybe one global allocator is not such a bad idea.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux