Zi Yan <ziy@xxxxxxxxxx> writes: > On 8 Oct 2024, at 9:06, David Hildenbrand wrote: > >> On 08.10.24 14:57, Vlastimil Babka wrote: >>> On 10/8/24 13:52, Zi Yan wrote: >>>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote: >>>> >>>>> >>>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. >>>>> >>>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code? >>>> >>>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but >>> >>> Create some nice inline wrapper for the test and it will look less ugly? :) > > something like? > > static inline bool alloc_zeroed() > { > return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, > &init_on_alloc); > } > > > I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP. > So both PMD THP and mTHP are zeroed twice for all arch. > > Adding Ryan for mTHP. > >>> >>>> folio_zero_user() uses vmf->address to improve cache performance by changing >>>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target >>>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this >>>> optimization. To keep it, vmf->address will need to be passed to allocation >>>> code. Maybe that is acceptable? >>> >>> I'd rather not change the page allocation code for this... >> >> Although I'm curious if that optimization from 2017 is still valuable :) > > Maybe Ying can give some insight on this. I guess the optimization still applies now. Although the size of the per-core(thread) last level cache increases, it's still quite common for it to be smaller than the size of THP. And the sizes of L1/L2 are significantly smaller, the likelihood for the accessed cache line to be in L1/L2/LLC increases with the optimization. -- Best Regards, Huang, Ying