Re: [PATCH net-next v15 08/13] mm: page_frag: use __alloc_pages() to replace alloc_pages_node()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 27, 2024 at 5:07 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>
> On 2024/8/27 1:00, Alexander Duyck wrote:
> > On Mon, Aug 26, 2024 at 5:46 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
> >>
> >> It seems there is about 24Bytes binary size increase for
> >> __page_frag_cache_refill() after refactoring in arm64 system
> >> with 64K PAGE_SIZE. By doing the gdb disassembling, It seems
> >> we can have more than 100Bytes decrease for the binary size
> >> by using __alloc_pages() to replace alloc_pages_node(), as
> >> there seems to be some unnecessary checking for nid being
> >> NUMA_NO_NODE, especially when page_frag is part of the mm
> >> system.
> >>
> >> CC: Alexander Duyck <alexander.duyck@xxxxxxxxx>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
> >> ---
> >>  mm/page_frag_cache.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> >> index bba59c87d478..e0ad3de11249 100644
> >> --- a/mm/page_frag_cache.c
> >> +++ b/mm/page_frag_cache.c
> >> @@ -28,11 +28,11 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
> >>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >>         gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
> >>                    __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
> >> -       page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
> >> -                               PAGE_FRAG_CACHE_MAX_ORDER);
> >> +       page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER,
> >> +                            numa_mem_id(), NULL);
> >>  #endif
> >>         if (unlikely(!page)) {
> >> -               page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
> >> +               page = __alloc_pages(gfp, 0, numa_mem_id(), NULL);
> >>                 if (unlikely(!page)) {
> >>                         nc->encoded_page = 0;
> >>                         return NULL;
> >
> > I still think this would be better served by fixing alloc_pages_node
> > to drop the superfluous checks rather than changing the function. We
> > would get more gain by just addressing the builtin constant and
> > NUMA_NO_NODE case there.
>
> I am supposing by 'just addressing the builtin constant and NUMA_NO_NODE
> case', it meant the below change from the previous discussion:
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 01a49be7c98d..009ffb50d8cd 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -290,6 +290,9 @@ struct folio *__folio_alloc_node_noprof(gfp_t gfp, unsigned int order, int nid)
>  static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
>                                                    unsigned int order)
>  {
> +       if (__builtin_constant_p(nid) && nid == NUMA_NO_NODE)
> +               return __alloc_pages_noprof(gfp_mask, order, numa_mem_id(), NULL);
> +
>         if (nid == NUMA_NO_NODE)
>                 nid = numa_mem_id();
>
>
> Actually it does not seem to get more gain by judging from binary size
> changing as below, vmlinux.org is the image after this patchset, and
> vmlinux is the image after this patchset with this patch reverted and
> with above change applied.
>
> [linyunsheng@localhost net-next]$ ./scripts/bloat-o-meter vmlinux.org vmlinux
> add/remove: 0/2 grow/shrink: 16/12 up/down: 432/-340 (92)
> Function                                     old     new   delta
> new_slab                                     808    1124    +316
> its_probe_one                               2860    2908     +48

...

> alloc_slab_page                              284       -    -284
> Total: Before=30534822, After=30534914, chg +0.00%

Well considering that alloc_slab_page was marked to be "inline" as per
the qualifier applied to it I would say the shrinking had an effect,
but it was just enough to enable the "inline" qualifier to kick in. It
could be argued that the change exposed another issue in that the
alloc_slab_page function is probably large enough that it should just
be "static" and not "static inline". If you can provide you config I
could probably look into this further but I suspect just dropping the
inline for that one function should result in net savings.

The only other big change I see is in its_probe_one which I am not
sure why it would be impacted since it is not passing a constant in
the first place, it is passing its->numa_node. I'd be curious what the
disassembly shows in terms of the change that caused it to increase in
size.

Otherwise the rest of the size changes seem more like code shifts than
anything else likely due to the functions shifting around slightly due
to a few dropping in size.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux