On 19/09/2023 09:41, Muchun Song wrote: >> On Sep 19, 2023, at 16:26, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >> On 19/09/2023 07:42, Muchun Song wrote: >>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx> >>>> >>>> In an effort to minimize amount of TLB flushes, batch all PMD splits >>>> belonging to a range of pages in order to perform only 1 (global) TLB >>>> flush. >>>> >>>> Add a flags field to the walker and pass whether it's a bulk allocation >>>> or just a single page to decide to remap. First value >>>> (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the TLB >>>> flush when we split the PMD. >>>> >>>> Rebased and updated by Mike Kravetz >>>> >>>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >>>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >>>> --- >>>> mm/hugetlb_vmemmap.c | 79 +++++++++++++++++++++++++++++++++++++++++--- >>>> 1 file changed, 75 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>>> index 147ed15bcae4..e8bc2f7567db 100644 >>>> --- a/mm/hugetlb_vmemmap.c >>>> +++ b/mm/hugetlb_vmemmap.c >>>> @@ -27,6 +27,7 @@ >>>> * @reuse_addr: the virtual address of the @reuse_page page. >>>> * @vmemmap_pages: the list head of the vmemmap pages that can be freed >>>> * or is mapped from. >>>> + * @flags: used to modify behavior in bulk operations >>> >>> Better to describe it as "used to modify behavior in vmemmap page table walking >>> operations" >>> >> OK >> >>>> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head >>>> *folio_list) >>>> { >>>> struct folio *folio; >>>> LIST_HEAD(vmemmap_pages); >>>> + list_for_each_entry(folio, folio_list, lru) >>>> + hugetlb_vmemmap_split(h, &folio->page); >>>> + >>>> + flush_tlb_all(); >>>> + >>>> list_for_each_entry(folio, folio_list, lru) { >>>> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >>>> &vmemmap_pages); >>> >>> This is unlikely to be failed since the page table allocation >>> is moved to the above >> >>> (Note that the head vmemmap page allocation >>> is not mandatory). >> >> Good point that I almost forgot >> >>> So we should handle the error case in the above >>> splitting operation. >> >> But back to the previous discussion in v2... the thinking was that /some/ PMDs >> got split, and say could allow some PTE remapping to occur and free some pages >> back (each page allows 6 more splits worst case). Then the next >> __hugetlb_vmemmap_optimize() will have to split PMD pages again for those >> hugepages that failed the batch PMD split (as we only defer the PTE remap tlb >> flush in this stage). > > Oh, yes. Maybe we could break the above traversal as early as possible > once we enter an ENOMEM? > Sounds good -- no point in keep trying to split if we are failing with OOM. Perhaps a comment in both of these clauses (the early break on split and the OOM handling in batch optimize) could help make this clear. >> >> Unless this isn't something worth handling >> >> Joao > > >