On 2025/3/4 6:13, Chuck Lever wrote: > On 2/28/25 4:44 AM, Yunsheng Lin wrote: >> As mentioned in [1], it seems odd to check NULL elements in >> the middle of page bulk allocating, and it seems caller can >> do a better job of bulk allocating pages into a whole array >> sequentially without checking NULL elements first before >> doing the page bulk allocation for most of existing users. > > Sorry, but this still makes a claim without providing any data > to back it up. Why can callers "do a better job"? What I meant "do a better job" is that callers are already keeping track of how many pages have been allocated, and it seems convenient to just pass 'page_array + nr_allocated' and 'nr_pages - nr_allocated' when calling subsequent page bulk alloc API so that NULL checking can be avoided, which seems to be the pattern I see in alloc_pages_bulk_interleave(). > > >> Through analyzing of bulk allocation API used in fs, it >> seems that the callers are depending on the assumption of >> populating only NULL elements in fs/btrfs/extent_io.c and >> net/sunrpc/svc_xprt.c while erofs and btrfs don't, see: >> commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator") >> commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()") >> commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers") >> commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator") >> >> Change SUNRPC and btrfs to not depend on the assumption. >> Other existing callers seems to be passing all NULL elements >> via memset, kzalloc, etc. >> >> Remove assumption of populating only NULL elements and treat >> page_array as output parameter like kmem_cache_alloc_bulk(). >> Remove the above assumption also enable the caller to not >> zero the array before calling the page bulk allocating API, >> which has about 1~2 ns performance improvement for the test >> case of time_bench_page_pool03_slow() for page_pool in a >> x86 vm system, this reduces some performance impact of >> fixing the DMA API misuse problem in [2], performance >> improves from 87.886 ns to 86.429 ns. >> >> 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@xxxxxxxxxx/ >> 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@xxxxxxxxxx/ >> CC: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> >> CC: Luiz Capitulino <luizcap@xxxxxxxxxx> >> CC: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> >> CC: Dave Chinner <david@xxxxxxxxxxxxx> >> CC: Chuck Lever <chuck.lever@xxxxxxxxxx> >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> >> Acked-by: Jeff Layton <jlayton@xxxxxxxxxx> >> --- >> V2: >> 1. Drop RFC tag and rebased on latest linux-next. >> 2. Fix a compile error for xfs. >> 3. Defragmemt the page_array for SUNRPC and btrfs. >> --- >> drivers/vfio/pci/virtio/migrate.c | 2 -- >> fs/btrfs/extent_io.c | 23 +++++++++++++++++----- >> fs/erofs/zutil.c | 12 ++++++------ >> fs/xfs/xfs_buf.c | 9 +++++---- >> mm/page_alloc.c | 32 +++++-------------------------- >> net/core/page_pool.c | 3 --- >> net/sunrpc/svc_xprt.c | 22 +++++++++++++++++---- >> 7 files changed, 52 insertions(+), 51 deletions(-) > > 52:51 is not an improvement. 1-2 ns is barely worth mentioning. The > sunrpc and btrfs callers are more complex and carry duplicated code. Yes, the hard part is to find common file to place the common function as something as below: void defragment_pointer_array(void** array, int size) { int slow = 0; for (int fast = 0; fast < size; fast++) { if (array[fast] != NULL) { swap(&array[fast], &array[slow]); slow++; } } } Or introduce a function as something like alloc_pages_refill_array() for the usecase of sunrpc and xfs and do the array defragment in alloc_pages_refill_array() before calling alloc_pages_bulk()? Any suggestion? > > Not an outright objection from me, but it's hard to justify this change. The above should reduce the number to something like 40:51. Also, I looked more closely at other callers calling alloc_pages_bulk(), it seems vm_module_tags_populate() doesn't clear next_page[i] to NULL after __free_page() when doing 'Clean up and error out', I am not sure if vm_module_tags_populate() will be called multi-times as vm_module_tags->pages seems to be only set to all NULL once, see alloc_tag_init() -> alloc_mod_tags_mem(). Cc Suren to see if there is some clarifying for the above.