On 2025/1/3 4:00, Mel Gorman wrote: > On Wed, Dec 25, 2024 at 08:36:04PM +0800, Yunsheng Lin wrote: >> On 2024/12/24 6:00, Luiz Capitulino wrote: >> >>> /* >>> - * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array >>> + * __alloc_pages_bulk - Allocate a number of order-0 pages to an array >>> * @gfp: GFP flags for the allocation >>> * @preferred_nid: The preferred NUMA node ID to allocate from >>> * @nodemask: Set of nodes to allocate from, may be NULL >>> - * @nr_pages: The number of pages desired on the list or array >>> - * @page_list: Optional list to store the allocated pages >>> - * @page_array: Optional array to store the pages >>> + * @nr_pages: The number of pages desired in the array >>> + * @page_array: Array to store the pages >>> * >>> * This is a batched version of the page allocator that attempts to >>> - * allocate nr_pages quickly. Pages are added to page_list if page_list >>> - * is not NULL, otherwise it is assumed that the page_array is valid. >>> + * allocate nr_pages quickly. Pages are added to the page_array. >>> * >>> - * For lists, nr_pages is the number of pages that should be allocated. >>> - * >>> - * For arrays, only NULL elements are populated with pages and nr_pages >>> + * Note that only NULL elements are populated with pages and nr_pages >> >> It is not really related to this patch, but while we are at this, the above >> seems like an odd behavior. By roughly looking at all the callers of that >> API, it seems like only the below callers rely on that? >> fs/erofs/zutil.c: z_erofs_gbuf_growsize() >> fs/xfs/xfs_buf.c: xfs_buf_alloc_pages() >> >> It seems it is quite straight forward to change the above callers to not >> rely on the above behavior, and we might be able to avoid more checking >> by removing the above behavior? >> > > It was implemented that way for an early user, net/sunrpc/svc_xprt.c. > The behaviour removes a burden from the caller to track the number of > populated elements and then pass the exact number of pages that must be > allocated. If the API does not handle that detail, each caller needs > similar state tracking implementations. As the overhead is going to be > the same whether the API implements it once or each caller implements > there own, it is simplier if there is just one implementation. It seems it is quite straight forward to change the above use case to not rely on that by something like below? diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 43c57124de52..52800bfddc86 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -670,19 +670,21 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp) pages = RPCSVC_MAXPAGES; } - for (filled = 0; filled < pages; filled = ret) { - ret = alloc_pages_bulk_array(GFP_KERNEL, pages, - rqstp->rq_pages); - if (ret > filled) + for (filled = 0; filled < pages;) { + ret = alloc_pages_bulk_array(GFP_KERNEL, pages - filled, + rqstp->rq_pages + filled); + if (ret) { + filled += ret; /* Made progress, don't sleep yet */ continue; + } set_current_state(TASK_IDLE); if (svc_thread_should_stop(rqstp)) { set_current_state(TASK_RUNNING); return false; } - trace_svc_alloc_arg_err(pages, ret); + trace_svc_alloc_arg_err(pages, filled); memalloc_retry_wait(GFP_KERNEL); } rqstp->rq_page_end = &rqstp->rq_pages[pages]; >