On Fri, Feb 23, 2024 at 9:14 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Wed, Feb 21, 2024 at 11:05:09AM -0800, Alexei Starovoitov wrote: > > +#define VM_BPF 0x00000800 /* bpf_arena pages */ > > > > +static inline struct vm_struct *get_bpf_vm_area(unsigned long size) > > +{ > > + return get_vm_area(size, VM_BPF); > > +} > > > > and enforce that flag in vm_area_[un]map_pages() ? > > > > vmallocinfo can display it or skip it. > > Things like find_vm_area() can do something different with such an area > > (if that was the concern). > > Well, a growing allocation is a generally useful feature. I'd > rather not limit it to bpf if we can. sure. See VM_SPARSE proposal in the other email. > > > For the dynamically growing part do you need a special allocator or > > > can we just go straight to the page allocator and implement this > > > in common code? > > > > It's a bit special allocator that is using maple tree to manage > > range within 4G region and > > alloc_pages_node(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT) > > to grab pages. > > With extra dance: > > memcg = bpf_map_get_memcg(map); > > old_memcg = set_active_memcg(memcg); > > to make sure memcg accounting is done the common way for all bpf maps. > > Ok, so it's not just a growing allocation but actually sparse and > all over the place? That doesn't really make it easier to come > up with a good enough interface. yep. > How do you decide what gets placed > where? See proposal in the other email in this thread. tldr: it's a user space mmap() like interface. either give me N pages at any addr or give me N pages at this addr if this range is still free. > > struct vm_struct *area = get_sparse_vm_area(size); > > vm_area_alloc_pages(struct vm_struct *area, ulong addr, int page_cnt, > > int numa_id); > > > > and vm_area_alloc_pages() will allocate pages and vmap_pages_range() > > them while all code in mm/vmalloc.c ? > > My vague hope was that we could just start out with an area and > grow it. But it sounds like you need something much more complex > that that. yes. With bpf specific tricks due to lower 32-bit wrap around. > But yes, a more specific API is probably a better idea. And maybe > the cookie should be a VM area either but a structure dedicated to > this. Right. see 'struct sparse_vm_area' proposal in the other email.