On 2025-01-03 09:27, Mel Gorman wrote:
On Fri, Jan 03, 2025 at 07:29:30PM +0800, Yunsheng Lin wrote:
On 2025/1/3 4:00, Mel Gorman wrote:
On Wed, Dec 25, 2024 at 08:36:04PM +0800, Yunsheng Lin wrote:
On 2024/12/24 6:00, Luiz Capitulino wrote:
/*
- * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
+ * __alloc_pages_bulk - Allocate a number of order-0 pages to an array
* @gfp: GFP flags for the allocation
* @preferred_nid: The preferred NUMA node ID to allocate from
* @nodemask: Set of nodes to allocate from, may be NULL
- * @nr_pages: The number of pages desired on the list or array
- * @page_list: Optional list to store the allocated pages
- * @page_array: Optional array to store the pages
+ * @nr_pages: The number of pages desired in the array
+ * @page_array: Array to store the pages
*
* This is a batched version of the page allocator that attempts to
- * allocate nr_pages quickly. Pages are added to page_list if page_list
- * is not NULL, otherwise it is assumed that the page_array is valid.
+ * allocate nr_pages quickly. Pages are added to the page_array.
*
- * For lists, nr_pages is the number of pages that should be allocated.
- *
- * For arrays, only NULL elements are populated with pages and nr_pages
+ * Note that only NULL elements are populated with pages and nr_pages
It is not really related to this patch, but while we are at this, the above
seems like an odd behavior. By roughly looking at all the callers of that
API, it seems like only the below callers rely on that?
fs/erofs/zutil.c: z_erofs_gbuf_growsize()
fs/xfs/xfs_buf.c: xfs_buf_alloc_pages()
It seems it is quite straight forward to change the above callers to not
rely on the above behavior, and we might be able to avoid more checking
by removing the above behavior?
It was implemented that way for an early user, net/sunrpc/svc_xprt.c.
The behaviour removes a burden from the caller to track the number of
populated elements and then pass the exact number of pages that must be
allocated. If the API does not handle that detail, each caller needs
similar state tracking implementations. As the overhead is going to be
the same whether the API implements it once or each caller implements
there own, it is simplier if there is just one implementation.
It seems it is quite straight forward to change the above use case to
not rely on that by something like below?
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 43c57124de52..52800bfddc86 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -670,19 +670,21 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp)
pages = RPCSVC_MAXPAGES;
}
- for (filled = 0; filled < pages; filled = ret) {
- ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
- rqstp->rq_pages);
- if (ret > filled)
+ for (filled = 0; filled < pages;) {
+ ret = alloc_pages_bulk_array(GFP_KERNEL, pages - filled,
+ rqstp->rq_pages + filled);
+ if (ret) {
+ filled += ret;
/* Made progress, don't sleep yet */
continue;
+ }
set_current_state(TASK_IDLE);
if (svc_thread_should_stop(rqstp)) {
set_current_state(TASK_RUNNING);
return false;
}
- trace_svc_alloc_arg_err(pages, ret);
+ trace_svc_alloc_arg_err(pages, filled);
memalloc_retry_wait(GFP_KERNEL);
}
rqstp->rq_page_end = &rqstp->rq_pages[pages];
The API implementation would also need to change to make this work as the
return value is a number of pages that are on the array, not the number of
new pages allocated. Even if fixed, it still moves cost and complexity to
the caller and the API is harder to use and easier to make mistakes. That
shift in responsibility and the maintenance burden would need to be
justified. While it is possible to use wrappers to allow callers to decide
whether to manage the tracking or let the API handle it, the requirement
then is to show that there is a performance gain for a common use case.
I agree with all your points, but I think there's value in simplifying
alloc_pages_bulk_noprof() if the wrappers turn out not be complex or
difficult to use.
If all users of this case always fill the array sequentially, then
maybe we could just have a wrapper that scans the array first (this
is the first loop in alloc_pages_bulk_noprof()) or maybe callers
could keep track of 'filled' (via a pointer or macro usage). But
this is just brainstorming, we'd need to see the resulting code to
know if it works and makes sense.
This is outside the scope of a serise that removes the page_list
argument. Even if a series was proposed to shift responsibility to the
caller, I would not expect it to be submitted with a page_list removal.
Yes, absolutely.