Re: [PATCH v2 bpf-next 04/20] mm: Expose vmap_pages_range() to the rest of the kernel.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 14, 2024 at 10:58 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 14, 2024 at 12:53:42PM -0800, Alexei Starovoitov wrote:
> > On Wed, Feb 14, 2024 at 12:36 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > >
> > > NAK.  Please
> >
> > What is the alternative?
> > Remember, maintainers cannot tell developers "go away".
> > They must suggest a different path.
>
> That criteria is something you've made up.

I didn't invent it. I internalized it based on the feedback received.

> Telling that something
> is not ok is the most important job of not just maintainers but all
> developers.

I'm not saying that maintainers should not say "no",
I'm saying that maintainers should say "no", understand the problem
being solved, and suggest an alternative.

> Maybe start with a description of the problem you're
> solving and why you think it matters and needs different APIs.

bpf_arena doesn't need a different api. These 5 api-s below are enough.
I'm saying that vmap_pages_range() is equivalent to apply_to_page_range()
for all practical purposes.
So, since apply_to_page_range() is available to the kernel
(xen, gpu, kasan, etc) then I see no reason why
vmap_pages_range() shouldn't be available as well, since:

struct vmap_ctx {
     struct page **pages;
     int idx;
};

static int __for_each_pte(pte_t *ptep, unsigned long addr, void *data)
{
     struct vmap_ctx *ctx = data;
     struct page *page = ctx->pages[ctx->idx++];

     /* TODO: sanity checks here */
     set_pte_at(&init_mm, addr, ptep, mk_pte(page, PAGE_KERNEL));
     return 0;
}

static int vmap_pages_range_hack(unsigned long addr, unsigned long end,
                                 struct page **pages)
{
    struct vmap_ctx ctx = { .pages = pages };

    return apply_to_page_range(&init_mm, addr, end - addr,
__for_each_pte, &ctx);
}

Anything I miss?

> > . get_vm_area - external
> > . free_vm_area - EXPORT_SYMBOL_GPL
> > . vunmap_range - external
> > . vmalloc_to_page - EXPORT_SYMBOL
> > . apply_to_page_range - EXPORT_SYMBOL_GPL
> >
> > and the last one is pretty much equivalent to vmap_pages_range,
> > hence I'm surprised by push back to make vmap_pages_range available to bpf.
>
> And the last we've been trying to get rid of by ages because we don't
> want random modules to

Get rid of EXPORT_SYMBOL from it? Fine by me.
Or you're saying that you have a plan to replace apply_to_page_range()
with something else ? With what ?

> > > > For example, there is the public ioremap_page_range(), which is used
> > > > to map device memory into addressable kernel space.
> > >
> > > It's not really public.  It's a helper for the ioremap implementation
> > > which really should not be arch specific to start with and are in
> > > the process of beeing consolidatd into common code.
> >
> > Any link to such consolidation of ioremap ? I couldn't find one.
>
> Second hit on google:
>
> https://lore.kernel.org/lkml/20230609075528.9390-1-bhe@xxxxxxxxxx/T/

Thanks.
It sounded like you were referring to some future work.
The series that landed was a good cleanup.
No questions about it.

> > I surely don't want bpf_arena to cause headaches to mm folks.
> >
> > Anyway, ioremap_page_range() was just an example.
> > I could have used vmap() as an equivalent example.
> > vmap is EXPORT_SYMBOL, btw.
>
> vmap is a good well defined API.  vmap_pages_range is not.

since vmap() is nothing but get_vm_area() + vmap_pages_range()
and few checks... I'm missing the point.
Pls elaborate.

> > What bpf_arena needs is pretty much vmap(), but instead of
> > allocating all pages in advance, allocate them and insert on demand.
>
> So propose an API that does that instead of exposing random low-level
> details.

The generic_ioremap_prot() and vmap() APIs make sense for the cases
when phys memory exists with known size. It needs to vmap-ed and
not touched after.
bpf_arena use case is similar to kasan which
reserves a giant virtual memory region, and then
does apply_to_page_range() to populate certain pte-s with pages in that region,
and later apply_to_existing_page_range() to free pages in kasan's region.

bpf_arena is very similar, except it currently calls get_vm_area()
to get a 4Gb+guard_pages region, and then vmap_pages_range() to
populate a page in it, and vunmap_range() to remove a page.

These existing api-s work, so not sure what you're requesting.
I can guess many different things, but pls clarify to reduce
this back and forth.
Are you worried about range checking? That vmap_pages_range()
can accidently hit an unintended range?

btw the cover letter and patch 5 explain the higher level motivation
from bpf pov in detail.
There was a bunch of feedback on that patch, which was addressed,
and the latest version is here:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=arena&id=a752b4122071adb5307d7ab3ae6736a9a0e45317





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux