Re: [RFC 03/11] khugepaged: Don't allocate khugepaged mm_slot early

Dev Jain <dev.jain@xxxxxxx> · Fri, 10 Jan 2025 11:41:28 +0530

On 09/01/25 5:01 am, Nico Pache wrote:
We should only "enter"/allocate the khugepaged mm_slot if we succeed at
allocating the PMD sized folio. Move the khugepaged_enter_vma call until
after we know the vma_alloc_folio was successful.

Why? We have the appropriate checks from thp_vma_allowable_orders() and 
friends, so the VMA should be registered with khugepaged irrespective of
whether during fault time we are able to allocate a PMD-THP or not. If 
we fail at fault time, it is the job of khugepaged to try to collapse it 
later.

Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
---
  mm/huge_memory.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e53d83b3e5cf..635c65e7ef63 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1323,7 +1323,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
  	ret = vmf_anon_prepare(vmf);
  	if (ret)
  		return ret;
-	khugepaged_enter_vma(vma, vma->vm_flags);
  
  	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
  			!mm_forbids_zeropage(vma->vm_mm) &&
@@ -1365,7 +1364,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
  		}
  		return ret;
  	}
-
+	khugepaged_enter_vma(vma, vma->vm_flags);
  	return __do_huge_pmd_anonymous_page(vmf);
  }
  

In any case, you are not achieving what you described in the patch 
description: you have moved khugepaged_enter_vma() after the read fault 
logic, what you want to do is to move it after 
vma_alloc_anon_folio_pmd() in __do_huge_pmd_anonymous_page().