On Thu, Sep 01, 2022 at 12:32:33AM +0200, Kumar Kartikeya Dwivedi wrote: > > bpf progs are more analogous to kernel modules. > > The modules just do kmalloc. > > The more we discuss the more I'm leaning towards the same model as well: > > Just one global allocator for all bpf progs. > > There does seem to be one big benefit in having a global allocator > (not per program, but actually globally in the kernel, basically a > percpu freelist cache fronting kmalloc) usable safely in any context. > We don't have to do any allocator lifetime tracking at all, that case > reduces to basically how we handle kernel kptrs currently. > > I am wondering if we can go with such an approach: by default, the > global allocator in the kernel serves bpf_mem_alloc requests, which > allows freelist sharing between all programs, it is basically kmalloc > but safe in NMI and with reentrancy protection. Right. That what I was proposing. > When determinism is > needed, use the percpu refcount approach with option 1 from Delyan for > the custom allocator case. I wasn't rejecting that part. I was suggesting to table that discussion. The best way to achieve guaranteed allocation is still an open question. So far we've only talked about a new map type with "allocator" type... Is this really the best design? > Now by default you have conservative global freelist sharing (percpu), > and when required program can use a custom allocator and prefill to > keep the cache ready to serve requests (where that kind of control > will be very useful for progs in NMI/hardirq context, where depletion > of cache means NULL from unit_alloc), where its own allocator freelist > will be unaffected by other allocations. The custom allocator is not necessary the right answer. It could be. Maybe it should be open coded free list of preallocated items that bpf prog takes from global allocator and pushes them to a list? We'll have locks and native link lists in bpf soon. So why "allocator" concept should do a double job of allocating and keeping a link list for prefill reasons? > Any kptr from the bpf_mem_alloc allocator can go to any map, no problem at all. > The only extra cost is maintaining the percpu live counts for > non-global allocators, it is basically free for the global case. > And it would also be allowed to probably choose and share allocators > between maps, as proposed by Alexei before. That has no effect on > kptrs being stored in them, as most commonly they would be from the > global allocator. It still feels to me that doing global allocator only for now will be good enough. prefill use case for one element can be solved already without any extra work (just kptr_xchg in and out). prefill of multiple objects might get nicely solved with native link lists too.