Re: [PATCH 02/10] io_uring: get rid of remap_pfn_range() for mapping rings/sqes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/28/24 8:08 AM, Johannes Weiner wrote:
> On Wed, Mar 27, 2024 at 01:13:37PM -0600, Jens Axboe wrote:
>> Rather than use remap_pfn_range() for this and manually free later,
>> switch to using vm_insert_pages() and have it Just Work.
>>
>> If possible, allocate a single compound page that covers the range that
>> is needed. If that works, then we can just use page_address() on that
>> page. If we fail to get a compound page, allocate single pages and use
>> vmap() to map them into the kernel virtual address space.
>>
>> This just covers the rings/sqes, the other remaining user of the mmap
>> remap_pfn_range() user will be converted separately. Once that is done,
>> we can kill the old alloc/free code.
>>
>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> 
> Overall this looks good to me.
> 
> Two comments below:
> 
>> @@ -2601,6 +2601,27 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
>>  	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
>>  }
>>  
>> +static void io_pages_unmap(void *ptr, struct page ***pages,
>> +			   unsigned short *npages)
>> +{
>> +	bool do_vunmap = false;
>> +
>> +	if (*npages) {
>> +		struct page **to_free = *pages;
>> +		int i;
>> +
>> +		/* only did vmap for non-compound and multiple pages */
>> +		do_vunmap = !PageCompound(to_free[0]) && *npages > 1;
>> +		for (i = 0; i < *npages; i++)
>> +			put_page(to_free[i]);
>> +	}
>> +	if (do_vunmap)
>> +		vunmap(ptr);
>> +	kvfree(*pages);
>> +	*pages = NULL;
>> +	*npages = 0;
>> +}
>> +
>>  void io_mem_free(void *ptr)
>>  {
>>  	if (!ptr)
>> @@ -2701,8 +2722,8 @@ static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr,
>>  static void io_rings_free(struct io_ring_ctx *ctx)
>>  {
>>  	if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
>> -		io_mem_free(ctx->rings);
>> -		io_mem_free(ctx->sq_sqes);
>> +		io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages);
>> +		io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages);
>>  	} else {
>>  		io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
>>  		ctx->n_ring_pages = 0;
>> @@ -2714,6 +2735,84 @@ static void io_rings_free(struct io_ring_ctx *ctx)
>>  	ctx->sq_sqes = NULL;
>>  }
>>  
>> +static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
>> +				   size_t size, gfp_t gfp)
>> +{
>> +	struct page *page;
>> +	int i, order;
>> +
>> +	order = get_order(size);
>> +	if (order > MAX_PAGE_ORDER)
>> +		return NULL;
>> +	else if (order)
>> +		gfp |= __GFP_COMP;
>> +
>> +	page = alloc_pages(gfp, order);
>> +	if (!page)
>> +		return NULL;
>> +
>> +	/* add pages, grab a ref to tail pages */
>> +	for (i = 0; i < nr_pages; i++) {
>> +		pages[i] = page + i;
>> +		if (i)
>> +			get_page(pages[i]);
>> +	}
> 
> You don't need those extra refs.
> 
> __GFP_COMP makes a super page that acts like a single entity. The ref
> returned by alloc_pages() keeps the whole thing alive; you can then do
> a single put in io_pages_unmap() for the compound case as well.
> 
> [ vm_insert_pages() and munmap() still do gets and puts on the tail
>   pages as they are individually mapped and unmapped, but those calls
>   get implicitly redirected to the compound refcount maintained in the
>   head page. IOW, an munmap() of an individual tail page won't free
>   that tail as long as you hold the base ref from the alloc_pages(). ]

OK then, so I can just do something ala:

diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index bf1527055679..d168752c206f 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -29,12 +29,8 @@ static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
 	if (!page)
 		return NULL;
 
-	/* add pages, grab a ref to tail pages */
-	for (i = 0; i < nr_pages; i++) {
+	for (i = 0; i < nr_pages; i++)
 		pages[i] = page + i;
-		if (i)
-			get_page(pages[i]);
-	}
 
 	return page_address(page);
 }
@@ -100,8 +96,14 @@ void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
 		struct page **to_free = *pages;
 		int i;
 
-		/* only did vmap for non-compound and multiple pages */
-		do_vunmap = !PageCompound(to_free[0]) && *npages > 1;
+		/*
+		 * Only did vmap for the non-compound multiple page case.
+		 * For the compound page, we just need to put the head.
+		 */
+		if (PageCompound(to_free[0]))
+			*npages = 1;
+		else if (*npages > 1)
+			do_vunmap = true;
 		for (i = 0; i < *npages; i++)
 			put_page(to_free[i]);
 	}

and not need any extra refs. I wish the compound page was a bit more
integrated, eg I could just do vm_inser_page() on a single compound page
and have it Just Work. But I have to treat it as separate pages there.

Thanks!


>> +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size,
>> +				 gfp_t gfp)
>> +{
>> +	void *ret;
>> +	int i;
>> +
>> +	for (i = 0; i < nr_pages; i++) {
>> +		pages[i] = alloc_page(gfp);
>> +		if (!pages[i])
>> +			goto err;
>> +	}
>> +
>> +	ret = vmap(pages, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
> 
> You can kill the VM_ALLOW_HUGE_VMAP.
> 
> It's a no-op in vmap(), since you're passing an array of order-0
> pages, which cannot be mapped by anything larger than PTEs.

Noted, will kill the VM_ALLOW_HUGE_VMAP.

-- 
Jens Axboe





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux