On Sat, Jun 11, 2022 at 10:13:52AM +0800, Miaohe Lin wrote: > Since commit 5232c63f46fd ("mm: Make compound_pincount always available"), > compound_pincount_ptr is stored at first tail page now. So we should call > prep_compound_head() after the first tail page is initialized to take > advantage of the likelihood of that tail struct page being cached given > that we will read them right after in prep_compound_head(). > > Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> > Cc: Joao Martins <joao.m.martins@xxxxxxxxxx> > --- > v2: > Don't move prep_compound_head() outside loop per Joao. > --- > mm/page_alloc.c | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 4c7d99ee58b4..048df5d78add 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6771,13 +6771,18 @@ static void __ref memmap_init_compound(struct page *head, > set_page_count(page, 0); > > /* > - * The first tail page stores compound_mapcount_ptr() and > - * compound_order() and the second tail page stores > - * compound_pincount_ptr(). Call prep_compound_head() after > - * the first and second tail pages have been initialized to > - * not have the data overwritten. > + * The first tail page stores compound_mapcount_ptr(), > + * compound_order() and compound_pincount_ptr(). Call > + * prep_compound_head() after the first tail page have > + * been initialized to not have the data overwritten. > + * > + * Note the idea to make this right after we initialize > + * the offending tail pages is trying to take advantage > + * of the likelihood of those tail struct pages being > + * cached given that we will read them right after in > + * prep_compound_head(). > */ > - if (pfn == head_pfn + 2) > + if (unlikely(pfn == head_pfn + 1)) > prep_compound_head(head, order); For me it is weird not to put this out of the loop. I saw the reason is because of the caching suggested by Joao. But I think this is not a hot path and putting it out of the loop may be more intuitive at least for me. Maybe this optimization is unnecessary (maybe I am wrong). And it will be consistent with prep_compound_page() (at least it does not do the similar optimization) if we drop this optimization. Hi Joao, I am wondering is it a significant optimization for zone device memory? I found this code existed from the 1st version you introduced. So I suspect maybe you have some numbers, would you like to share with us? Thanks.