> On Sep 20, 2023, at 18:39, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > On 20/09/2023 03:47, Muchun Song wrote: >>> On Sep 19, 2023, at 23:09, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>> On 19/09/2023 09:57, Muchun Song wrote: >>>>> On Sep 19, 2023, at 16:55, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>>> On 19/09/2023 09:41, Muchun Song wrote: >>>>>>> On Sep 19, 2023, at 16:26, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>>>>> On 19/09/2023 07:42, Muchun Song wrote: >>>>>>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>>>>>> list_for_each_entry(folio, folio_list, lru) { >>>>>>>>> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >>>>>>>>> &vmemmap_pages); >>>>>>>> >>>>>>>> This is unlikely to be failed since the page table allocation >>>>>>>> is moved to the above >>>>>>> >>>>>>>> (Note that the head vmemmap page allocation >>>>>>>> is not mandatory). >>>>>>> >>>>>>> Good point that I almost forgot >>>>>>> >>>>>>>> So we should handle the error case in the above >>>>>>>> splitting operation. >>>>>>> >>>>>>> But back to the previous discussion in v2... the thinking was that /some/ PMDs >>>>>>> got split, and say could allow some PTE remapping to occur and free some pages >>>>>>> back (each page allows 6 more splits worst case). Then the next >>>>>>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for those >>>>>>> hugepages that failed the batch PMD split (as we only defer the PTE remap tlb >>>>>>> flush in this stage). >>>>>> >>>>>> Oh, yes. Maybe we could break the above traversal as early as possible >>>>>> once we enter an ENOMEM? >>>>>> >>>>> >>>>> Sounds good -- no point in keep trying to split if we are failing with OOM. >>>>> >>>>> Perhaps a comment in both of these clauses (the early break on split and the OOM >>>>> handling in batch optimize) could help make this clear. >>>> >>>> Make sense. >>> >>> These are the changes I have so far for this patch based on the discussion so >>> far. For next one it's at the end: >> >> Code looks good to me. One nit below. >> > Thanks > >>> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index e8bc2f7567db..d9c6f2cf698c 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -27,7 +27,8 @@ >>> * @reuse_addr: the virtual address of the @reuse_page page. >>> * @vmemmap_pages: the list head of the vmemmap pages that can be freed >>> * or is mapped from. >>> - * @flags: used to modify behavior in bulk operations >>> + * @flags: used to modify behavior in vmemmap page table walking >>> + * operations. >>> */ >>> struct vmemmap_remap_walk { >>> void (*remap_pte)(pte_t *pte, unsigned long addr, >>> @@ -36,6 +37,8 @@ struct vmemmap_remap_walk { >>> struct page *reuse_page; >>> unsigned long reuse_addr; >>> struct list_head *vmemmap_pages; >>> + >>> +/* Skip the TLB flush when we split the PMD */ >>> #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) >>> unsigned long flags; >>> }; >>> @@ -132,7 +135,7 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, >>> int ret; >>> >>> ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, >>> - walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH); >>> + !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)); >>> if (ret) >>> return ret; >>> >>> @@ -677,13 +680,13 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, >>> struct page *head) >>> free_vmemmap_page_list(&vmemmap_pages); >>> } >>> >>> -static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head) >>> +static int hugetlb_vmemmap_split(const struct hstate *h, struct page *head) >>> { >>> unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; >>> unsigned long vmemmap_reuse; >>> >>> if (!vmemmap_should_optimize(h, head)) >>> - return; >>> + return 0; >>> >>> vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); >>> vmemmap_reuse = vmemmap_start; >>> @@ -693,7 +696,7 @@ static void hugetlb_vmemmap_split(const struct hstate *h, >>> struct page *head) >>> * Split PMDs on the vmemmap virtual address range [@vmemmap_start, >>> * @vmemmap_end] >>> */ >>> - vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); >>> + return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); >>> } >>> >>> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head >>> *folio_list) >>> @@ -701,8 +704,18 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, >>> struct list_head *folio_l >>> struct folio *folio; >>> LIST_HEAD(vmemmap_pages); >>> >>> - list_for_each_entry(folio, folio_list, lru) >>> - hugetlb_vmemmap_split(h, &folio->page); >>> + list_for_each_entry(folio, folio_list, lru) { >>> + int ret = hugetlb_vmemmap_split(h, &folio->page); >>> + >>> + /* >>> + * Spliting the PMD requires allocating a page, thus lets fail >> ^^^^ ^^^ >> Splitting page table page >> >> I'd like to specify the functionality of the allocated page. >> > OK > >>> + * early once we encounter the first OOM. No point in retrying >>> + * as it can be dynamically done on remap with the memory >>> + * we get back from the vmemmap deduplication. >>> + */ >>> + if (ret == -ENOMEM) >>> + break; >>> + } >>> >>> flush_tlb_all(); >>> >>> For patch 7, I only have commentary added derived from this earlier discussion >>> above: >>> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index d9c6f2cf698c..f6a1020a4b6a 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -40,6 +40,8 @@ struct vmemmap_remap_walk { >>> >>> /* Skip the TLB flush when we split the PMD */ >>> #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) >>> +/* Skip the TLB flush when we remap the PTE */ >>> #define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) >>> unsigned long flags; >>> }; >>> >>> @@ -721,19 +739,28 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, >>> struct list_head *folio_l >>> >>> list_for_each_entry(folio, folio_list, lru) { >>> int ret = __hugetlb_vmemmap_optimize(h, &folio->page, >>> &vmemmap_pages, >>> VMEMMAP_REMAP_NO_TLB_FLUSH); >>> >>> /* >>> * Pages to be freed may have been accumulated. If we >>> * encounter an ENOMEM, free what we have and try again. >>> + * This can occur in the case that both spliting fails >> ^^^ >> splitting >> > > ok > >>> + * halfway and head page allocation also failed. In this >> ^^^^^^^ >> head vmemmap page >> > ok > >> Otherwise: >> >> Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> >> > > Thanks, I assume that's for both patches? Yes. Thanks. > >> Thanks. >> >>> + * case __hugetlb_vmemmap_optimize() would free memory >>> + * allowing more vmemmap remaps to occur. >>> */ >>> if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) {