On 3/28/24 8:08 AM, Johannes Weiner wrote: > On Wed, Mar 27, 2024 at 01:13:37PM -0600, Jens Axboe wrote: >> Rather than use remap_pfn_range() for this and manually free later, >> switch to using vm_insert_pages() and have it Just Work. >> >> If possible, allocate a single compound page that covers the range that >> is needed. If that works, then we can just use page_address() on that >> page. If we fail to get a compound page, allocate single pages and use >> vmap() to map them into the kernel virtual address space. >> >> This just covers the rings/sqes, the other remaining user of the mmap >> remap_pfn_range() user will be converted separately. Once that is done, >> we can kill the old alloc/free code. >> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > Overall this looks good to me. > > Two comments below: > >> @@ -2601,6 +2601,27 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, >> return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; >> } >> >> +static void io_pages_unmap(void *ptr, struct page ***pages, >> + unsigned short *npages) >> +{ >> + bool do_vunmap = false; >> + >> + if (*npages) { >> + struct page **to_free = *pages; >> + int i; >> + >> + /* only did vmap for non-compound and multiple pages */ >> + do_vunmap = !PageCompound(to_free[0]) && *npages > 1; >> + for (i = 0; i < *npages; i++) >> + put_page(to_free[i]); >> + } >> + if (do_vunmap) >> + vunmap(ptr); >> + kvfree(*pages); >> + *pages = NULL; >> + *npages = 0; >> +} >> + >> void io_mem_free(void *ptr) >> { >> if (!ptr) >> @@ -2701,8 +2722,8 @@ static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr, >> static void io_rings_free(struct io_ring_ctx *ctx) >> { >> if (!(ctx->flags & IORING_SETUP_NO_MMAP)) { >> - io_mem_free(ctx->rings); >> - io_mem_free(ctx->sq_sqes); >> + io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages); >> + io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages); >> } else { >> io_pages_free(&ctx->ring_pages, ctx->n_ring_pages); >> ctx->n_ring_pages = 0; >> @@ -2714,6 +2735,84 @@ static void io_rings_free(struct io_ring_ctx *ctx) >> ctx->sq_sqes = NULL; >> } >> >> +static void *io_mem_alloc_compound(struct page **pages, int nr_pages, >> + size_t size, gfp_t gfp) >> +{ >> + struct page *page; >> + int i, order; >> + >> + order = get_order(size); >> + if (order > MAX_PAGE_ORDER) >> + return NULL; >> + else if (order) >> + gfp |= __GFP_COMP; >> + >> + page = alloc_pages(gfp, order); >> + if (!page) >> + return NULL; >> + >> + /* add pages, grab a ref to tail pages */ >> + for (i = 0; i < nr_pages; i++) { >> + pages[i] = page + i; >> + if (i) >> + get_page(pages[i]); >> + } > > You don't need those extra refs. > > __GFP_COMP makes a super page that acts like a single entity. The ref > returned by alloc_pages() keeps the whole thing alive; you can then do > a single put in io_pages_unmap() for the compound case as well. > > [ vm_insert_pages() and munmap() still do gets and puts on the tail > pages as they are individually mapped and unmapped, but those calls > get implicitly redirected to the compound refcount maintained in the > head page. IOW, an munmap() of an individual tail page won't free > that tail as long as you hold the base ref from the alloc_pages(). ] OK then, so I can just do something ala: diff --git a/io_uring/memmap.c b/io_uring/memmap.c index bf1527055679..d168752c206f 100644 --- a/io_uring/memmap.c +++ b/io_uring/memmap.c @@ -29,12 +29,8 @@ static void *io_mem_alloc_compound(struct page **pages, int nr_pages, if (!page) return NULL; - /* add pages, grab a ref to tail pages */ - for (i = 0; i < nr_pages; i++) { + for (i = 0; i < nr_pages; i++) pages[i] = page + i; - if (i) - get_page(pages[i]); - } return page_address(page); } @@ -100,8 +96,14 @@ void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, struct page **to_free = *pages; int i; - /* only did vmap for non-compound and multiple pages */ - do_vunmap = !PageCompound(to_free[0]) && *npages > 1; + /* + * Only did vmap for the non-compound multiple page case. + * For the compound page, we just need to put the head. + */ + if (PageCompound(to_free[0])) + *npages = 1; + else if (*npages > 1) + do_vunmap = true; for (i = 0; i < *npages; i++) put_page(to_free[i]); } and not need any extra refs. I wish the compound page was a bit more integrated, eg I could just do vm_inser_page() on a single compound page and have it Just Work. But I have to treat it as separate pages there. Thanks! >> +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size, >> + gfp_t gfp) >> +{ >> + void *ret; >> + int i; >> + >> + for (i = 0; i < nr_pages; i++) { >> + pages[i] = alloc_page(gfp); >> + if (!pages[i]) >> + goto err; >> + } >> + >> + ret = vmap(pages, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL); > > You can kill the VM_ALLOW_HUGE_VMAP. > > It's a no-op in vmap(), since you're passing an array of order-0 > pages, which cannot be mapped by anything larger than PTEs. Noted, will kill the VM_ALLOW_HUGE_VMAP. -- Jens Axboe