> On Sep 19, 2023, at 23:09, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > On 19/09/2023 09:57, Muchun Song wrote: >>> On Sep 19, 2023, at 16:55, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>> On 19/09/2023 09:41, Muchun Song wrote: >>>>> On Sep 19, 2023, at 16:26, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>>> On 19/09/2023 07:42, Muchun Song wrote: >>>>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>>>> list_for_each_entry(folio, folio_list, lru) { >>>>>>> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >>>>>>> &vmemmap_pages); >>>>>> >>>>>> This is unlikely to be failed since the page table allocation >>>>>> is moved to the above >>>>> >>>>>> (Note that the head vmemmap page allocation >>>>>> is not mandatory). >>>>> >>>>> Good point that I almost forgot >>>>> >>>>>> So we should handle the error case in the above >>>>>> splitting operation. >>>>> >>>>> But back to the previous discussion in v2... the thinking was that /some/ PMDs >>>>> got split, and say could allow some PTE remapping to occur and free some pages >>>>> back (each page allows 6 more splits worst case). Then the next >>>>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for those >>>>> hugepages that failed the batch PMD split (as we only defer the PTE remap tlb >>>>> flush in this stage). >>>> >>>> Oh, yes. Maybe we could break the above traversal as early as possible >>>> once we enter an ENOMEM? >>>> >>> >>> Sounds good -- no point in keep trying to split if we are failing with OOM. >>> >>> Perhaps a comment in both of these clauses (the early break on split and the OOM >>> handling in batch optimize) could help make this clear. >> >> Make sense. > > These are the changes I have so far for this patch based on the discussion so > far. For next one it's at the end: Code looks good to me. One nit below. > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index e8bc2f7567db..d9c6f2cf698c 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -27,7 +27,8 @@ > * @reuse_addr: the virtual address of the @reuse_page page. > * @vmemmap_pages: the list head of the vmemmap pages that can be freed > * or is mapped from. > - * @flags: used to modify behavior in bulk operations > + * @flags: used to modify behavior in vmemmap page table walking > + * operations. > */ > struct vmemmap_remap_walk { > void (*remap_pte)(pte_t *pte, unsigned long addr, > @@ -36,6 +37,8 @@ struct vmemmap_remap_walk { > struct page *reuse_page; > unsigned long reuse_addr; > struct list_head *vmemmap_pages; > + > +/* Skip the TLB flush when we split the PMD */ > #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) > unsigned long flags; > }; > @@ -132,7 +135,7 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, > int ret; > > ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, > - walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH); > + !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)); > if (ret) > return ret; > > @@ -677,13 +680,13 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, > struct page *head) > free_vmemmap_page_list(&vmemmap_pages); > } > > -static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head) > +static int hugetlb_vmemmap_split(const struct hstate *h, struct page *head) > { > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > unsigned long vmemmap_reuse; > > if (!vmemmap_should_optimize(h, head)) > - return; > + return 0; > > vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); > vmemmap_reuse = vmemmap_start; > @@ -693,7 +696,7 @@ static void hugetlb_vmemmap_split(const struct hstate *h, > struct page *head) > * Split PMDs on the vmemmap virtual address range [@vmemmap_start, > * @vmemmap_end] > */ > - vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); > + return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); > } > > void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head > *folio_list) > @@ -701,8 +704,18 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, > struct list_head *folio_l > struct folio *folio; > LIST_HEAD(vmemmap_pages); > > - list_for_each_entry(folio, folio_list, lru) > - hugetlb_vmemmap_split(h, &folio->page); > + list_for_each_entry(folio, folio_list, lru) { > + int ret = hugetlb_vmemmap_split(h, &folio->page); > + > + /* > + * Spliting the PMD requires allocating a page, thus lets fail ^^^^ ^^^ Splitting page table page I'd like to specify the functionality of the allocated page. > + * early once we encounter the first OOM. No point in retrying > + * as it can be dynamically done on remap with the memory > + * we get back from the vmemmap deduplication. > + */ > + if (ret == -ENOMEM) > + break; > + } > > flush_tlb_all(); > > For patch 7, I only have commentary added derived from this earlier discussion > above: > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index d9c6f2cf698c..f6a1020a4b6a 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -40,6 +40,8 @@ struct vmemmap_remap_walk { > > /* Skip the TLB flush when we split the PMD */ > #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) > +/* Skip the TLB flush when we remap the PTE */ > #define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) > unsigned long flags; > }; > > @@ -721,19 +739,28 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, > struct list_head *folio_l > > list_for_each_entry(folio, folio_list, lru) { > int ret = __hugetlb_vmemmap_optimize(h, &folio->page, > &vmemmap_pages, > VMEMMAP_REMAP_NO_TLB_FLUSH); > > /* > * Pages to be freed may have been accumulated. If we > * encounter an ENOMEM, free what we have and try again. > + * This can occur in the case that both spliting fails ^^^ splitting > + * halfway and head page allocation also failed. In this ^^^^^^^ head vmemmap page Otherwise: Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> Thanks. > + * case __hugetlb_vmemmap_optimize() would free memory > + * allowing more vmemmap remaps to occur. > */ > if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) { >