Re: [PATCH bpf-next] mm: Introduce vm_area_[un]map_pages().

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 23 Feb 2024 09:27:14 -0800

On Fri, Feb 23, 2024 at 9:14 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 21, 2024 at 11:05:09AM -0800, Alexei Starovoitov wrote:
> > +#define VM_BPF                 0x00000800      /* bpf_arena pages */
> >
> > +static inline struct vm_struct *get_bpf_vm_area(unsigned long size)
> > +{
> > +       return get_vm_area(size, VM_BPF);
> > +}
> >
> > and enforce that flag in vm_area_[un]map_pages() ?
> >
> > vmallocinfo can display it or skip it.
> > Things like find_vm_area() can do something different with such an area
> > (if that was the concern).
>
> Well, a growing allocation is a generally useful feature.  I'd
> rather not limit it to bpf if we can.

sure. See VM_SPARSE proposal in the other email.

> > > For the dynamically growing part do you need a special allocator or
> > > can we just go straight to the page allocator and implement this
> > > in common code?
> >
> > It's a bit special allocator that is using maple tree to manage
> > range within 4G region and
> > alloc_pages_node(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT)
> > to grab pages.
> > With extra dance:
> >         memcg = bpf_map_get_memcg(map);
> >         old_memcg = set_active_memcg(memcg);
> > to make sure memcg accounting is done the common way for all bpf maps.
>
> Ok, so it's not just a growing allocation but actually sparse and
> all over the place?  That doesn't really make it easier to come
> up with a good enough interface.

yep.

> How do you decide what gets placed
> where?

See proposal in the other email in this thread.
tldr: it's a user space mmap() like interface.
either give me N pages at any addr or
give me N pages at this addr if this range is still free.

> > struct vm_struct *area = get_sparse_vm_area(size);
> > vm_area_alloc_pages(struct vm_struct *area, ulong addr, int page_cnt,
> > int numa_id);
> >
> > and vm_area_alloc_pages() will allocate pages and vmap_pages_range()
> > them while all code in mm/vmalloc.c ?
>
> My vague hope was that we could just start out with an area and
> grow it.  But it sounds like you need something much more complex
> that that.

yes. With bpf specific tricks due to lower 32-bit wrap around.

> But yes, a more specific API is probably a better idea.  And maybe
> the cookie should be a VM area either but a structure dedicated to
> this.

Right. see 'struct sparse_vm_area' proposal in the other email.