On Tue, Oct 17, 2023 at 1:41 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Mon, Oct 16, 2023 at 1:06 PM Vishal Moola (Oracle) > <vishal.moola@xxxxxxxxx> wrote: > > > > Replaces 5 calls to compound_head(), and removes 1466 bytes of kernel > > text. > > > > Previously, to determine if any pte was shared, the page mapcount > > corresponding exactly to the pte was checked. This gave us a precise > > number of shared ptes. Using folio_estimated_sharers() instead uses > > the mapcount of the head page, giving us an estimate for tail page ptes. > > > > This means if a tail page's mapcount is greater than its head page's > > mapcount, folio_estimated_sharers() would be underestimating the number of > > shared ptes, and vice versa. > > > > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@xxxxxxxxx> > > --- > > mm/khugepaged.c | 26 ++++++++++++-------------- > > 1 file changed, 12 insertions(+), 14 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 7a552fe16c92..67aac53b31c8 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -1245,7 +1245,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, > > pte_t *pte, *_pte; > > int result = SCAN_FAIL, referenced = 0; > > int none_or_zero = 0, shared = 0; > > - struct page *page = NULL; > > + struct folio *folio = NULL; > > unsigned long _address; > > spinlock_t *ptl; > > int node = NUMA_NO_NODE, unmapped = 0; > > @@ -1316,13 +1316,13 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, > > if (pte_write(pteval)) > > writable = true; > > > > - page = vm_normal_page(vma, _address, pteval); > > - if (unlikely(!page) || unlikely(is_zone_device_page(page))) { > > + folio = vm_normal_folio(vma, _address, pteval); > > + if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) { > > result = SCAN_PAGE_NULL; > > goto out_unmap; > > } > > > > - if (page_mapcount(page) > 1) { > > + if (folio_estimated_sharers(folio) > 1) { > > This doesn't look correct. The max_ptes_shared is used to control the > cap of shared PTEs. IIRC, folio_estimated_sharers() just reads the > mapcount of the head page. If we set max_ptes_shared to 256, and just > the head page is shared, but "shared" will return 512 and prevent from > collapsing the area even though just one PTE is shared. This breaks > the semantics of max_ptes_shared. In my testing this replacement appears to do the opposite (underestimating instead of overestimating), which admittedly still breaks the semantics of max_ptes_shared. It appears that this happens quite frequently in many regular use cases, so I'll go back to checking each individual subpage's mapcount in v2. > > ++shared; > > if (cc->is_khugepaged && > > shared > khugepaged_max_ptes_shared) { > > @@ -1332,29 +1332,27 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, > > } > > } > > > > - page = compound_head(page); > > - > > /* > > * Record which node the original page is from and save this > > * information to cc->node_load[]. > > * Khugepaged will allocate hugepage from the node has the max > > * hit record. > > */ > > - node = page_to_nid(page); > > + node = folio_nid(folio); > > if (hpage_collapse_scan_abort(node, cc)) { > > result = SCAN_SCAN_ABORT; > > goto out_unmap; > > } > > cc->node_load[node]++; > > - if (!PageLRU(page)) { > > + if (!folio_test_lru(folio)) { > > result = SCAN_PAGE_LRU; > > goto out_unmap; > > } > > - if (PageLocked(page)) { > > + if (folio_test_locked(folio)) { > > result = SCAN_PAGE_LOCK; > > goto out_unmap; > > } > > - if (!PageAnon(page)) { > > + if (!folio_test_anon(folio)) { > > result = SCAN_PAGE_ANON; > > goto out_unmap; > > } > > @@ -1369,7 +1367,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, > > * has excessive GUP pins (i.e. 512). Anyway the same check > > * will be done again later the risk seems low. > > */ > > - if (!is_refcount_suitable(page)) { > > + if (!is_refcount_suitable(&folio->page)) { > > result = SCAN_PAGE_COUNT; > > goto out_unmap; > > } > > @@ -1379,8 +1377,8 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, > > * enough young pte to justify collapsing the page > > */ > > if (cc->is_khugepaged && > > - (pte_young(pteval) || page_is_young(page) || > > - PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, > > + (pte_young(pteval) || folio_test_young(folio) || > > + folio_test_referenced(folio) || mmu_notifier_test_young(vma->vm_mm, > > address))) > > referenced++; > > } > > @@ -1402,7 +1400,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, > > *mmap_locked = false; > > } > > out: > > - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, > > + trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, > > none_or_zero, result, unmapped); > > return result; > > } > > -- > > 2.40.1 > >