Re: [PATCH v2] mm: alloc_pages_bulk: remove assumption of populating only NULL elements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025/3/4 6:13, Chuck Lever wrote:
> On 2/28/25 4:44 AM, Yunsheng Lin wrote:
>> As mentioned in [1], it seems odd to check NULL elements in
>> the middle of page bulk allocating, and it seems caller can
>> do a better job of bulk allocating pages into a whole array
>> sequentially without checking NULL elements first before
>> doing the page bulk allocation for most of existing users.
> 
> Sorry, but this still makes a claim without providing any data
> to back it up. Why can callers "do a better job"?

What I meant "do a better job" is that callers are already keeping
track of how many pages have been allocated, and it seems convenient
to just pass 'page_array + nr_allocated' and 'nr_pages - nr_allocated'
when calling subsequent page bulk alloc API so that NULL checking
can be avoided, which seems to be the pattern I see in
alloc_pages_bulk_interleave().

> 
> 
>> Through analyzing of bulk allocation API used in fs, it
>> seems that the callers are depending on the assumption of
>> populating only NULL elements in fs/btrfs/extent_io.c and
>> net/sunrpc/svc_xprt.c while erofs and btrfs don't, see:
>> commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator")
>> commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()")
>> commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers")
>> commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator")
>>
>> Change SUNRPC and btrfs to not depend on the assumption.
>> Other existing callers seems to be passing all NULL elements
>> via memset, kzalloc, etc.
>>
>> Remove assumption of populating only NULL elements and treat
>> page_array as output parameter like kmem_cache_alloc_bulk().
>> Remove the above assumption also enable the caller to not
>> zero the array before calling the page bulk allocating API,
>> which has about 1~2 ns performance improvement for the test
>> case of time_bench_page_pool03_slow() for page_pool in a
>> x86 vm system, this reduces some performance impact of
>> fixing the DMA API misuse problem in [2], performance
>> improves from 87.886 ns to 86.429 ns.
>>
>> 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@xxxxxxxxxx/
>> 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@xxxxxxxxxx/
>> CC: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
>> CC: Luiz Capitulino <luizcap@xxxxxxxxxx>
>> CC: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
>> CC: Dave Chinner <david@xxxxxxxxxxxxx>
>> CC: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>> Acked-by: Jeff Layton <jlayton@xxxxxxxxxx>
>> ---
>> V2:
>> 1. Drop RFC tag and rebased on latest linux-next.
>> 2. Fix a compile error for xfs.
>> 3. Defragmemt the page_array for SUNRPC and btrfs.
>> ---
>>  drivers/vfio/pci/virtio/migrate.c |  2 --
>>  fs/btrfs/extent_io.c              | 23 +++++++++++++++++-----
>>  fs/erofs/zutil.c                  | 12 ++++++------
>>  fs/xfs/xfs_buf.c                  |  9 +++++----
>>  mm/page_alloc.c                   | 32 +++++--------------------------
>>  net/core/page_pool.c              |  3 ---
>>  net/sunrpc/svc_xprt.c             | 22 +++++++++++++++++----
>>  7 files changed, 52 insertions(+), 51 deletions(-)
> 
> 52:51 is not an improvement. 1-2 ns is barely worth mentioning. The
> sunrpc and btrfs callers are more complex and carry duplicated code.

Yes, the hard part is to find common file to place the common function
as something as below:

void defragment_pointer_array(void** array, int size) {
    int slow = 0;
    for (int fast = 0; fast < size; fast++) {
        if (array[fast] != NULL) {
            swap(&array[fast], &array[slow]);
            slow++;
        }
    }
}

Or introduce a function as something like alloc_pages_refill_array()
for the usecase of sunrpc and xfs and do the array defragment in
alloc_pages_refill_array() before calling alloc_pages_bulk()?
Any suggestion?

> 
> Not an outright objection from me, but it's hard to justify this change.

The above should reduce the number to something like 40:51.

Also, I looked more closely at other callers calling alloc_pages_bulk(),
it seems vm_module_tags_populate() doesn't clear next_page[i] to NULL after
__free_page() when doing 'Clean up and error out', I am not sure if
vm_module_tags_populate() will be called multi-times as vm_module_tags->pages
seems to be only set to all NULL once, see alloc_tag_init() -> alloc_mod_tags_mem().

Cc Suren to see if there is some clarifying for the above.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux