On Tue, Aug 27, 2024 at 5:07 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > > On 2024/8/27 1:00, Alexander Duyck wrote: > > On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > >> > >> It seems there is about 24Bytes binary size increase for > >> __page_frag_cache_refill() after refactoring in arm64 system > >> with 64K PAGE_SIZE. By doing the gdb disassembling, It seems > >> we can have more than 100Bytes decrease for the binary size > >> by using __alloc_pages() to replace alloc_pages_node(), as > >> there seems to be some unnecessary checking for nid being > >> NUMA_NO_NODE, especially when page_frag is part of the mm > >> system. > >> > >> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx> > >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> > >> --- > >> mm/page_frag_cache.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > >> index bba59c87d478..e0ad3de11249 100644 > >> --- a/mm/page_frag_cache.c > >> +++ b/mm/page_frag_cache.c > >> @@ -28,11 +28,11 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, > >> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > >> gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | > >> __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; > >> - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, > >> - PAGE_FRAG_CACHE_MAX_ORDER); > >> + page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER, > >> + numa_mem_id(), NULL); > >> #endif > >> if (unlikely(!page)) { > >> - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); > >> + page = __alloc_pages(gfp, 0, numa_mem_id(), NULL); > >> if (unlikely(!page)) { > >> nc->encoded_page = 0; > >> return NULL; > > > > I still think this would be better served by fixing alloc_pages_node > > to drop the superfluous checks rather than changing the function. We > > would get more gain by just addressing the builtin constant and > > NUMA_NO_NODE case there. > > I am supposing by 'just addressing the builtin constant and NUMA_NO_NODE > case', it meant the below change from the previous discussion: > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 01a49be7c98d..009ffb50d8cd 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -290,6 +290,9 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid) > static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask, > unsigned int order) > { > + if (__builtin_constant_p(nid) && nid == NUMA_NO_NODE) > + return __alloc_pages_noprof(gfp_mask, order, numa_mem_id(), NULL); > + > if (nid == NUMA_NO_NODE) > nid = numa_mem_id(); > > > Actually it does not seem to get more gain by judging from binary size > changing as below, vmlinux.org is the image after this patchset, and > vmlinux is the image after this patchset with this patch reverted and > with above change applied. > > [linyunsheng@localhost net-next]$ ./scripts/bloat-o-meter vmlinux.org vmlinux > add/remove: 0/2 grow/shrink: 16/12 up/down: 432/-340 (92) > Function old new delta > new_slab 808 1124 +316 > its_probe_one 2860 2908 +48 ... > alloc_slab_page 284 - -284 > Total: Before=30534822, After=30534914, chg +0.00% Well considering that alloc_slab_page was marked to be "inline" as per the qualifier applied to it I would say the shrinking had an effect, but it was just enough to enable the "inline" qualifier to kick in. It could be argued that the change exposed another issue in that the alloc_slab_page function is probably large enough that it should just be "static" and not "static inline". If you can provide you config I could probably look into this further but I suspect just dropping the inline for that one function should result in net savings. The only other big change I see is in its_probe_one which I am not sure why it would be impacted since it is not passing a constant in the first place, it is passing its->numa_node. I'd be curious what the disassembly shows in terms of the change that caused it to increase in size. Otherwise the rest of the size changes seem more like code shifts than anything else likely due to the functions shifting around slightly due to a few dropping in size.