Re: [PATCH RFC] SUNRPC: Refresh rq_pages using a bulk page allocator

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 22 Feb 2021 14:58:04 +0000

> On Feb 22, 2021, at 4:35 AM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Mon, Feb 15, 2021 at 11:06:07AM -0500, Chuck Lever wrote:
>> Reduce the rate at which nfsd threads hammer on the page allocator.
>> This improves throughput scalability by enabling the nfsd threads to
>> run more independently of each other.
>> 
> 
> Sorry this is taking so long, there is a lot going on.
> 
> This patch has pre-requisites that are not in mainline which makes it
> harder to evaluate what the semantics of the API should be.
> 
>> @@ -659,19 +659,33 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>> 		/* use as many pages as possible */
>> 		pages = RPCSVC_MAXPAGES;
>> 	}
>> -	for (i = 0; i < pages ; i++)
>> -		while (rqstp->rq_pages[i] == NULL) {
>> -			struct page *p = alloc_page(GFP_KERNEL);
>> -			if (!p) {
>> -				set_current_state(TASK_INTERRUPTIBLE);
>> -				if (signalled() || kthread_should_stop()) {
>> -					set_current_state(TASK_RUNNING);
>> -					return -EINTR;
>> -				}
>> -				schedule_timeout(msecs_to_jiffies(500));
>> +
>> +	for (needed = 0, i = 0; i < pages ; i++)
>> +		if (!rqstp->rq_pages[i])
>> +			needed++;
>> +	if (needed) {
>> +		LIST_HEAD(list);
>> +
>> +retry:
>> +		alloc_pages_bulk(GFP_KERNEL, 0,
>> +				 /* to test the retry logic: */
>> +				 min_t(unsigned long, needed, 13),
>> +				 &list);
>> +		for (i = 0; i < pages; i++) {
>> +			if (!rqstp->rq_pages[i]) {
>> +				struct page *page;
>> +
>> +				page = list_first_entry_or_null(&list,
>> +								struct page,
>> +								lru);
>> +				if (unlikely(!page))
>> +					goto empty_list;
>> +				list_del(&page->lru);
>> +				rqstp->rq_pages[i] = page;
>> +				needed--;
>> 			}
>> -			rqstp->rq_pages[i] = p;
>> 		}
>> +	}
>> 	rqstp->rq_page_end = &rqstp->rq_pages[pages];
>> 	rqstp->rq_pages[pages] = NULL; /* this might be seen in nfsd_splice_actor() */
>> 
> 
> There is a conflict at the end where rq_page_end gets updated. The 5.11
> code assumes that the loop around the allocator definitely gets all
> the required pages. What tree is this patch based on and is it going in
> during this merge window? While the conflict is "trivial" to resolve,
> it would be buggy because on retry, "i" will be pointing to the wrong
> index and pages potentially leak. Rather than guessing, I'd prefer to
> base a series on code you've tested.

I posted this patch as a proof of concept. There is a clean-up patch
that goes before it to deal properly with rq_page_end. I can post
both if you really want to apply this and play with it.

> The slowpath for the bulk allocator also sucks a bit for the semantics
> required by this caller. As the bulk allocator does not walk the zonelist,
> it can return failures prematurely -- fine for an optimistic bulk allocator
> that can return a subset of pages but not for this caller which really
> wants those pages. The allocator may need NOFAIL-like semantics to walk
> the zonelist if the caller really requires success or at least walk the
> zonelist if the preferred zone is low on pages. This patch would also
> need to preserve the schedule_timeout behaviour so it does not use a lot
> of CPU time retrying allocations in the presense of memory pressure.

Waiting half a second before trying again seems like overkill, though.

--
Chuck Lever