On Wed, Feb 14, 2024 at 10:58 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Wed, Feb 14, 2024 at 12:53:42PM -0800, Alexei Starovoitov wrote: > > On Wed, Feb 14, 2024 at 12:36 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > > > > > NAK. Please > > > > What is the alternative? > > Remember, maintainers cannot tell developers "go away". > > They must suggest a different path. > > That criteria is something you've made up. I didn't invent it. I internalized it based on the feedback received. > Telling that something > is not ok is the most important job of not just maintainers but all > developers. I'm not saying that maintainers should not say "no", I'm saying that maintainers should say "no", understand the problem being solved, and suggest an alternative. > Maybe start with a description of the problem you're > solving and why you think it matters and needs different APIs. bpf_arena doesn't need a different api. These 5 api-s below are enough. I'm saying that vmap_pages_range() is equivalent to apply_to_page_range() for all practical purposes. So, since apply_to_page_range() is available to the kernel (xen, gpu, kasan, etc) then I see no reason why vmap_pages_range() shouldn't be available as well, since: struct vmap_ctx { struct page **pages; int idx; }; static int __for_each_pte(pte_t *ptep, unsigned long addr, void *data) { struct vmap_ctx *ctx = data; struct page *page = ctx->pages[ctx->idx++]; /* TODO: sanity checks here */ set_pte_at(&init_mm, addr, ptep, mk_pte(page, PAGE_KERNEL)); return 0; } static int vmap_pages_range_hack(unsigned long addr, unsigned long end, struct page **pages) { struct vmap_ctx ctx = { .pages = pages }; return apply_to_page_range(&init_mm, addr, end - addr, __for_each_pte, &ctx); } Anything I miss? > > . get_vm_area - external > > . free_vm_area - EXPORT_SYMBOL_GPL > > . vunmap_range - external > > . vmalloc_to_page - EXPORT_SYMBOL > > . apply_to_page_range - EXPORT_SYMBOL_GPL > > > > and the last one is pretty much equivalent to vmap_pages_range, > > hence I'm surprised by push back to make vmap_pages_range available to bpf. > > And the last we've been trying to get rid of by ages because we don't > want random modules to Get rid of EXPORT_SYMBOL from it? Fine by me. Or you're saying that you have a plan to replace apply_to_page_range() with something else ? With what ? > > > > For example, there is the public ioremap_page_range(), which is used > > > > to map device memory into addressable kernel space. > > > > > > It's not really public. It's a helper for the ioremap implementation > > > which really should not be arch specific to start with and are in > > > the process of beeing consolidatd into common code. > > > > Any link to such consolidation of ioremap ? I couldn't find one. > > Second hit on google: > > https://lore.kernel.org/lkml/20230609075528.9390-1-bhe@xxxxxxxxxx/T/ Thanks. It sounded like you were referring to some future work. The series that landed was a good cleanup. No questions about it. > > I surely don't want bpf_arena to cause headaches to mm folks. > > > > Anyway, ioremap_page_range() was just an example. > > I could have used vmap() as an equivalent example. > > vmap is EXPORT_SYMBOL, btw. > > vmap is a good well defined API. vmap_pages_range is not. since vmap() is nothing but get_vm_area() + vmap_pages_range() and few checks... I'm missing the point. Pls elaborate. > > What bpf_arena needs is pretty much vmap(), but instead of > > allocating all pages in advance, allocate them and insert on demand. > > So propose an API that does that instead of exposing random low-level > details. The generic_ioremap_prot() and vmap() APIs make sense for the cases when phys memory exists with known size. It needs to vmap-ed and not touched after. bpf_arena use case is similar to kasan which reserves a giant virtual memory region, and then does apply_to_page_range() to populate certain pte-s with pages in that region, and later apply_to_existing_page_range() to free pages in kasan's region. bpf_arena is very similar, except it currently calls get_vm_area() to get a 4Gb+guard_pages region, and then vmap_pages_range() to populate a page in it, and vunmap_range() to remove a page. These existing api-s work, so not sure what you're requesting. I can guess many different things, but pls clarify to reduce this back and forth. Are you worried about range checking? That vmap_pages_range() can accidently hit an unintended range? btw the cover letter and patch 5 explain the higher level motivation from bpf pov in detail. There was a bunch of feedback on that patch, which was addressed, and the latest version is here: https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=arena&id=a752b4122071adb5307d7ab3ae6736a9a0e45317