> On Sep 19, 2023, at 16:55, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > On 19/09/2023 09:41, Muchun Song wrote: >>> On Sep 19, 2023, at 16:26, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>> On 19/09/2023 07:42, Muchun Song wrote: >>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>> From: Joao Martins <joao.m.martins@xxxxxxxxxx> >>>>> >>>>> In an effort to minimize amount of TLB flushes, batch all PMD splits >>>>> belonging to a range of pages in order to perform only 1 (global) TLB >>>>> flush. >>>>> >>>>> Add a flags field to the walker and pass whether it's a bulk allocation >>>>> or just a single page to decide to remap. First value >>>>> (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the TLB >>>>> flush when we split the PMD. >>>>> >>>>> Rebased and updated by Mike Kravetz >>>>> >>>>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >>>>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >>>>> --- >>>>> mm/hugetlb_vmemmap.c | 79 +++++++++++++++++++++++++++++++++++++++++--- >>>>> 1 file changed, 75 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>>>> index 147ed15bcae4..e8bc2f7567db 100644 >>>>> --- a/mm/hugetlb_vmemmap.c >>>>> +++ b/mm/hugetlb_vmemmap.c >>>>> @@ -27,6 +27,7 @@ >>>>> * @reuse_addr: the virtual address of the @reuse_page page. >>>>> * @vmemmap_pages: the list head of the vmemmap pages that can be freed >>>>> * or is mapped from. >>>>> + * @flags: used to modify behavior in bulk operations >>>> >>>> Better to describe it as "used to modify behavior in vmemmap page table walking >>>> operations" >>>> >>> OK >>> >>>>> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head >>>>> *folio_list) >>>>> { >>>>> struct folio *folio; >>>>> LIST_HEAD(vmemmap_pages); >>>>> + list_for_each_entry(folio, folio_list, lru) >>>>> + hugetlb_vmemmap_split(h, &folio->page); >>>>> + >>>>> + flush_tlb_all(); >>>>> + >>>>> list_for_each_entry(folio, folio_list, lru) { >>>>> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >>>>> &vmemmap_pages); >>>> >>>> This is unlikely to be failed since the page table allocation >>>> is moved to the above >>> >>>> (Note that the head vmemmap page allocation >>>> is not mandatory). >>> >>> Good point that I almost forgot >>> >>>> So we should handle the error case in the above >>>> splitting operation. >>> >>> But back to the previous discussion in v2... the thinking was that /some/ PMDs >>> got split, and say could allow some PTE remapping to occur and free some pages >>> back (each page allows 6 more splits worst case). Then the next >>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for those >>> hugepages that failed the batch PMD split (as we only defer the PTE remap tlb >>> flush in this stage). >> >> Oh, yes. Maybe we could break the above traversal as early as possible >> once we enter an ENOMEM? >> > > Sounds good -- no point in keep trying to split if we are failing with OOM. > > Perhaps a comment in both of these clauses (the early break on split and the OOM > handling in batch optimize) could help make this clear. Make sense. Thanks. > >>> >>> Unless this isn't something worth handling >>> >>> Joao