THP: enable=madivse defrag=defer max_ptes_none=511 scan_sleep_millisecs=1000 alloc_sleep_millisecs=1000 Test code: // allocate a 2MB chunk using mmap and later // release it by MADV_FRE Link: https://github.com/ioworker0/mmapvsmprotect/blob/main/test5.c root@x:/tmp# ./a.out | grep -B 23 hg 7f762a200000-7f762a400000 rw-p 00000000 00:00 0 Size: 2048 kB Anonymous: 2048 kB LazyFree: 2048 kB AnonHugePages: 0 kB THPeligible: 1 VmFlags: rd wr mr mw me ac sd hg // allocate a 2MB chunk using mmap and later // some pages marked as lazyfree with MADV_FREE Link: https://github.com/ioworker0/mmapvsmprotect/blob/main/test4.c root@x:/tmp# ./a.out | grep -B 23 hg 7f762a200000-7f762a400000 rw-p 00000000 00:00 0 Size: 2048 kB Anonymous: 2048 kB LazyFree: 0 kB AnonHugePages: 2048 kB THPeligible: 1 VmFlags: rd wr mr mw me ac sd hg root@x:/tmp# ./a.out [...] root@x:/tmp# echo $? 2 On Thu, Feb 1, 2024 at 8:53 PM Lance Yang <ioworker0@xxxxxxxxx> wrote: > > The collapsing behavior of khugepaged with pages > marked using MADV_FREE might cause confusion > among users. > > For instance, allocate a 2MB chunk using mmap and > later release it by MADV_FREE. Khugepaged will not > collapse this chunk. From the user's perspective, > it treats lazyfree pages as pte_none. However, > for some pages marked as lazyfree with MADV_FREE, > khugepaged might collapse this chunk and copy > these pages to a new huge page. This inconsistency > in behavior could be confusing for users. > > After a successful MADV_FREE operation, if there is > no subsequent write, the kernel can free the pages > at any time. Therefore, in my opinion, counting > lazyfree pages in max_pte_none seems reasonable. > > Perhaps treating MADV_FREE like MADV_DONTNEED, not > copying lazyfree pages when khugepaged collapses > huge pages in the background better aligns with > user expectations. > > Signed-off-by: Lance Yang <ioworker0@xxxxxxxxx> > --- > mm/khugepaged.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 2b219acb528e..6cbf46d42c6a 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -777,6 +777,7 @@ static int __collapse_huge_page_copy(pte_t *pte, > pmd_t orig_pmd, > struct vm_area_struct *vma, > unsigned long address, > + struct collapse_control *cc, > spinlock_t *ptl, > struct list_head *compound_pagelist) > { > @@ -797,6 +798,13 @@ static int __collapse_huge_page_copy(pte_t *pte, > continue; > } > src_page = pte_page(pteval); > + > + if (cc->is_khugepaged > + && !folio_test_swapbacked(page_folio(src_page))) { > + clear_user_highpage(page, _address); > + continue; > + } > + > if (copy_mc_user_highpage(page, src_page, _address, vma) > 0) { > result = SCAN_COPY_MC; > break; > @@ -1205,7 +1213,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > anon_vma_unlock_write(vma->anon_vma); > > result = __collapse_huge_page_copy(pte, hpage, pmd, _pmd, > - vma, address, pte_ptl, > + vma, address, cc, pte_ptl, > &compound_pagelist); > pte_unmap(pte); > if (unlikely(result != SCAN_SUCCEED)) > -- > 2.33.1 >