The patch titled Subject: mm/numa: no task_numa_fault() call if page table is changed has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-numa-no-task_numa_fault-call-if-page-table-is-changed.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-numa-no-task_numa_fault-call-if-page-table-is-changed.patch This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Zi Yan <ziy@xxxxxxxxxx> Subject: mm/numa: no task_numa_fault() call if page table is changed Date: Wed, 7 Aug 2024 14:47:29 -0400 When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") restructured do_numa_page() and do_huge_pmd_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pte_same()/!pmd_same() return immediately. This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting). Link: https://lkml.kernel.org/r/20240807184730.1266736-1-ziy@xxxxxxxxxx Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") Signed-off-by: Zi Yan <ziy@xxxxxxxxxx> Reported-by: "Huang, Ying" <ying.huang@xxxxxxxxx> Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/huge_memory.c | 5 +++-- mm/memory.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) --- a/mm/huge_memory.c~mm-numa-no-task_numa_fault-call-if-page-table-is-changed +++ a/mm/huge_memory.c @@ -1738,10 +1738,11 @@ vm_fault_t do_huge_pmd_numa_page(struct goto out_map; } -out: +count_fault: if (nid != NUMA_NO_NODE) task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags); +out: return 0; out_map: @@ -1753,7 +1754,7 @@ out_map: set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); - goto out; + goto count_fault; } /* --- a/mm/memory.c~mm-numa-no-task_numa_fault-call-if-page-table-is-changed +++ a/mm/memory.c @@ -5371,9 +5371,10 @@ static vm_fault_t do_numa_page(struct vm goto out_map; } -out: +count_fault: if (nid != NUMA_NO_NODE) task_numa_fault(last_cpupid, nid, nr_pages, flags); +out: return 0; out_map: /* @@ -5387,7 +5388,7 @@ out_map: numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte, writable); pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + goto count_fault; } static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) _ Patches currently in -mm which might be from ziy@xxxxxxxxxx are mm-numa-no-task_numa_fault-call-if-page-table-is-changed.patch memory-tiering-read-last_cpupid-correctly-in-do_huge_pmd_numa_page.patch memory-tiering-introduce-folio_use_access_time-check.patch memory-tiering-count-pgpromote_success-when-mem-tiering-is-enabled.patch mm-migrate-move-common-code-to-numa_migrate_check-was-numa_migrate_prep.patch