On 2024/8/27 1:00, Alexander Duyck wrote: > On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >> >> It seems there is about 24Bytes binary size increase for >> __page_frag_cache_refill() after refactoring in arm64 system >> with 64K PAGE_SIZE. By doing the gdb disassembling, It seems >> we can have more than 100Bytes decrease for the binary size >> by using __alloc_pages() to replace alloc_pages_node(), as >> there seems to be some unnecessary checking for nid being >> NUMA_NO_NODE, especially when page_frag is part of the mm >> system. >> >> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> >> --- >> mm/page_frag_cache.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c >> index bba59c87d478..e0ad3de11249 100644 >> --- a/mm/page_frag_cache.c >> +++ b/mm/page_frag_cache.c >> @@ -28,11 +28,11 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, >> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) >> gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | >> __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; >> - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, >> - PAGE_FRAG_CACHE_MAX_ORDER); >> + page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER, >> + numa_mem_id(), NULL); >> #endif >> if (unlikely(!page)) { >> - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); >> + page = __alloc_pages(gfp, 0, numa_mem_id(), NULL); >> if (unlikely(!page)) { >> nc->encoded_page = 0; >> return NULL; > > I still think this would be better served by fixing alloc_pages_node > to drop the superfluous checks rather than changing the function. We > would get more gain by just addressing the builtin constant and > NUMA_NO_NODE case there. I am supposing by 'just addressing the builtin constant and NUMA_NO_NODE case', it meant the below change from the previous discussion: diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 01a49be7c98d..009ffb50d8cd 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -290,6 +290,9 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid) static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order) { + if (__builtin_constant_p(nid) && nid == NUMA_NO_NODE) + return __alloc_pages_noprof(gfp_mask, order, numa_mem_id(), NULL); + if (nid == NUMA_NO_NODE) nid = numa_mem_id(); Actually it does not seem to get more gain by judging from binary size changing as below, vmlinux.org is the image after this patchset, and vmlinux is the image after this patchset with this patch reverted and with above change applied. [linyunsheng@localhost net-next]$ ./scripts/bloat-o-meter vmlinux.org vmlinux add/remove: 0/2 grow/shrink: 16/12 up/down: 432/-340 (92) Function old new delta new_slab 808 1124 +316 its_probe_one 2860 2908 +48 dpaa2_eth_set_dist_key 1096 1112 +16 rx_default_dqrr 2776 2780 +4 pcpu_unmap_pages 356 360 +4 iommu_dma_map_sg 1328 1332 +4 hns3_nic_net_timeout 704 708 +4 hns3_init_all_ring 1168 1172 +4 hns3_clear_all_ring 372 376 +4 enetc_refill_rx_ring 448 452 +4 enetc_free_rxtx_rings 276 280 +4 dpaa2_eth_xdp_xmit 676 680 +4 dpaa2_eth_rx 1716 1720 +4 __iommu_dma_alloc_noncontiguous 1176 1180 +4 __arm_lpae_alloc_pages 896 900 +4 ___slab_alloc 2120 2124 +4 pcpu_free_pages.constprop 236 232 -4 its_vpe_irq_domain_alloc 1496 1492 -4 its_alloc_table_entry 456 452 -4 hns3_reset_notify_init_enet 628 624 -4 dpaa_cleanup_tx_fd 556 552 -4 dpaa_bp_seed 232 228 -4 blk_update_request 944 940 -4 blk_execute_rq 540 536 -4 arm_64_lpae_alloc_pgtable_s1 680 676 -4 __arm_lpae_unmap 1596 1592 -4 __arm_lpae_free_pgtable 256 252 -4 ___kmalloc_large_node 340 336 -4 e843419@0f86_000124d9_a4 8 - -8 alloc_slab_page 284 - -284 Total: Before=30534822, After=30534914, chg +0.00%