On 20/09/2023 03:47, Muchun Song wrote: >> On Sep 19, 2023, at 23:09, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >> On 19/09/2023 09:57, Muchun Song wrote: >>>> On Sep 19, 2023, at 16:55, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>> On 19/09/2023 09:41, Muchun Song wrote: >>>>>> On Sep 19, 2023, at 16:26, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>>>> On 19/09/2023 07:42, Muchun Song wrote: >>>>>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>>>>> list_for_each_entry(folio, folio_list, lru) { >>>>>>>> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >>>>>>>> &vmemmap_pages); >>>>>>> >>>>>>> This is unlikely to be failed since the page table allocation >>>>>>> is moved to the above >>>>>> >>>>>>> (Note that the head vmemmap page allocation >>>>>>> is not mandatory). >>>>>> >>>>>> Good point that I almost forgot >>>>>> >>>>>>> So we should handle the error case in the above >>>>>>> splitting operation. >>>>>> >>>>>> But back to the previous discussion in v2... the thinking was that /some/ PMDs >>>>>> got split, and say could allow some PTE remapping to occur and free some pages >>>>>> back (each page allows 6 more splits worst case). Then the next >>>>>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for those >>>>>> hugepages that failed the batch PMD split (as we only defer the PTE remap tlb >>>>>> flush in this stage). >>>>> >>>>> Oh, yes. Maybe we could break the above traversal as early as possible >>>>> once we enter an ENOMEM? >>>>> >>>> >>>> Sounds good -- no point in keep trying to split if we are failing with OOM. >>>> >>>> Perhaps a comment in both of these clauses (the early break on split and the OOM >>>> handling in batch optimize) could help make this clear. >>> >>> Make sense. >> >> These are the changes I have so far for this patch based on the discussion so >> far. For next one it's at the end: > > Code looks good to me. One nit below. > Thanks >> >> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >> index e8bc2f7567db..d9c6f2cf698c 100644 >> --- a/mm/hugetlb_vmemmap.c >> +++ b/mm/hugetlb_vmemmap.c >> @@ -27,7 +27,8 @@ >> * @reuse_addr: the virtual address of the @reuse_page page. >> * @vmemmap_pages: the list head of the vmemmap pages that can be freed >> * or is mapped from. >> - * @flags: used to modify behavior in bulk operations >> + * @flags: used to modify behavior in vmemmap page table walking >> + * operations. >> */ >> struct vmemmap_remap_walk { >> void (*remap_pte)(pte_t *pte, unsigned long addr, >> @@ -36,6 +37,8 @@ struct vmemmap_remap_walk { >> struct page *reuse_page; >> unsigned long reuse_addr; >> struct list_head *vmemmap_pages; >> + >> +/* Skip the TLB flush when we split the PMD */ >> #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) >> unsigned long flags; >> }; >> @@ -132,7 +135,7 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, >> int ret; >> >> ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, >> - walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH); >> + !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)); >> if (ret) >> return ret; >> >> @@ -677,13 +680,13 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, >> struct page *head) >> free_vmemmap_page_list(&vmemmap_pages); >> } >> >> -static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head) >> +static int hugetlb_vmemmap_split(const struct hstate *h, struct page *head) >> { >> unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; >> unsigned long vmemmap_reuse; >> >> if (!vmemmap_should_optimize(h, head)) >> - return; >> + return 0; >> >> vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); >> vmemmap_reuse = vmemmap_start; >> @@ -693,7 +696,7 @@ static void hugetlb_vmemmap_split(const struct hstate *h, >> struct page *head) >> * Split PMDs on the vmemmap virtual address range [@vmemmap_start, >> * @vmemmap_end] >> */ >> - vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); >> + return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); >> } >> >> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head >> *folio_list) >> @@ -701,8 +704,18 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, >> struct list_head *folio_l >> struct folio *folio; >> LIST_HEAD(vmemmap_pages); >> >> - list_for_each_entry(folio, folio_list, lru) >> - hugetlb_vmemmap_split(h, &folio->page); >> + list_for_each_entry(folio, folio_list, lru) { >> + int ret = hugetlb_vmemmap_split(h, &folio->page); >> + >> + /* >> + * Spliting the PMD requires allocating a page, thus lets fail > ^^^^ ^^^ > Splitting page table page > > I'd like to specify the functionality of the allocated page. > OK >> + * early once we encounter the first OOM. No point in retrying >> + * as it can be dynamically done on remap with the memory >> + * we get back from the vmemmap deduplication. >> + */ >> + if (ret == -ENOMEM) >> + break; >> + } >> >> flush_tlb_all(); >> >> For patch 7, I only have commentary added derived from this earlier discussion >> above: >> >> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >> index d9c6f2cf698c..f6a1020a4b6a 100644 >> --- a/mm/hugetlb_vmemmap.c >> +++ b/mm/hugetlb_vmemmap.c >> @@ -40,6 +40,8 @@ struct vmemmap_remap_walk { >> >> /* Skip the TLB flush when we split the PMD */ >> #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) >> +/* Skip the TLB flush when we remap the PTE */ >> #define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) >> unsigned long flags; >> }; >> >> @@ -721,19 +739,28 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, >> struct list_head *folio_l >> >> list_for_each_entry(folio, folio_list, lru) { >> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >> &vmemmap_pages, >> VMEMMAP_REMAP_NO_TLB_FLUSH); >> >> /* >> * Pages to be freed may have been accumulated. If we >> * encounter an ENOMEM, free what we have and try again. >> + * This can occur in the case that both spliting fails > ^^^ > splitting > ok >> + * halfway and head page allocation also failed. In this > ^^^^^^^ > head vmemmap page > ok > Otherwise: > > Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > Thanks, I assume that's for both patches? > Thanks. > >> + * case __hugetlb_vmemmap_optimize() would free memory >> + * allowing more vmemmap remaps to occur. >> */ >> if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) { >> > >