Re: [PATCH v3 bpf-next 00/15] bpf: BPF specific memory allocator, UAPI in particular

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 31, 2022 at 05:38:15PM +0000, Delyan Kratunov wrote:
> 
> Overall, this design (or maybe the way it's presented here) conflates a few ideas.
> 
> 1) The extensions to expose and customize map's internal element allocator are fine
> independently of even this patchset.
> 
> 2) The idea that kptrs in a map need to have a statically identifiable allocator is
> taken as an axiom, and then expanded to its extreme (single allocator per map as
> opposed to the smarter verifier schemes). I still contest that this is not the case
> and the runtime overhead it avoids is paid back in bad developer experience multiple
> times over.
> 
> 3) The idea that allocators can be merged between elements and kptrs is independent
> of the static requirements. If the map's internal allocator is exposed per 1), we can
> still use it to allocate kptrs but not require that all kptrs in a map are from the
> same allocator.
> 
> Going this coarse in the API is easy for us but fundamentally more limiting for
> developers. It's not hard to imagine situations where the verifier dependency
> tracking or runtime lifetime tracking would allow for pinned maps to be retained but
> this scheme would require new maps entirely. (Any situation where you just refactored
> the implicit allocator out to share it, for example)
> 
> I also don't think that simplicity for us (a one time implementation cost +
> continuous maintenance cost) trumps over long term developer experience (a much
> bigger implementation cost over a much bigger time span).

It feels we're thinking about scope and use cases for the allocator quite
differently and what you're seeing as 'limiting developer choices' to me looks
like 'not a limiting issue at all'. To me the allocator is one
jemalloc/tcmalloc instance. One user space application with multiple threads,
lots of maps and code is using exactly one such allocator. The allocator
manages all the memory of user space process. In bpf land we don't have a bpf
process. We don't have a bpf name space either.  A loose analogy would be a set
of programs and maps managed by one user space agent. The bpf allocator would
manage all the memory of these maps and programs and provide a "memory namespace"
for this set of programs. Another user space agent with its own programs
would never want to share the same allocator. In user space a chunk of memory
could be mmap-ed between different process to share the data, but you would never
put a tcmalloc over such memory to be an allocator for different processes.

More below.

> So far, my ranked choice vote is:
> 
> 1) maximum flexibility and runtime live object counts (with exposed allocators, I
> like the merging)
> 2) medium flexibility with per-field allocator tracking in the verifier and the
> ability to lose the association once programs are unloaded and values are gone. This
> also works better with exposed allocators since they are implicitly pinned and would
> be usable to store values in another map.
> 3) minimum flexibility with static whole-map kptr allocators

The option 1 flexibility is necessary when allocator is seen as a private pool
of objects of given size. Like kernel's kmem_cache instance.
I don't think we quite there yet.
There is a need to "preallocate this object from sleepable context,
so the prog has a guaranteed chunk of memory to use in restricted context",
but, arguably, it's not a job of bpf allocator. bpf prog can allocate an object,
stash it into kptr, and use it later.
So option 3 doesn't feel less flexible to me. imo the whole-map-allocator is
more than we need. Ideally it would be easy to specifiy one single
allocator for all maps and progs in a set of .c files. Sort-of a bpf package.
In other words one bpf allocator per bpf "namespace" is more than enough.
Program authors shouldn't be creating allocators left and right. All these
free lists will waste memory.
btw I've added an extra patch to bpf_mem_alloc series:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=memalloc&id=6a586327a270272780bdad7446259bbe62574db1
that removes kmem_cache usage.
Turned out (hindsight 20/20) kmem_cache for each bpf map was a bad idea.
When free_lists are not shared they will similarly waste memory.
In user space the C code just does malloc() and the memory is isolated per process.
Ideally in bpf world the programs would just do:
bpf_mem_alloc(btf_type_id_local(struct foo));
without specifying an allocator, but that would require one global allocator
for all bpf programs in the kernel which is probably not a direction we should go ?
So the programs have to specify an allocator to use in bpf_mem_alloc(),
but it should be one for all progs, maps in a bpf-package/set/namespace.
If it's easy for programs to specify a bunch of allocators, like one for each program,
or one for each btf_type_id the bpf kernel infra would be required to merge
these allocators from day one. (The profileration of kmem_cache-s in the past
forced merging of them). By restricting bpf program choices with allocator-per-map
(this option 3) we're not only making the kernel side to do less work
(no run-time ref counts, no merging is required today), we're also pushing
bpf progs to use memory concious choices.
Having said all that maybe one global allocator is not such a bad idea.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux