"zhangpeng (AS)" <zhangpeng362@xxxxxxxxxx> writes: > On 2023/11/23 16:36, Huang, Ying wrote: > >> Peng Zhang <zhangpeng362@xxxxxxxxxx> writes: >> >>> From: ZhangPeng <zhangpeng362@xxxxxxxxxx> >>> >>> The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) >>> in application, which leading to an unexpected performance issue[1]. >>> >>> This caused by temporarily cleared pte during a read/modify/write update >>> of the pte, eg, do_numa_page()/change_pte_range(). >>> >>> For the data segment of the user-mode program, the global variable area >>> is a private mapping. After the pagecache is loaded, the private anonymous >>> page is generated after the COW is triggered. Mlockall can lock COW pages >>> (anonymous pages), but the original file pages cannot be locked and may >>> be reclaimed. If the global variable (private anon page) is accessed when >>> vmf->pte is zeroed in numa fault, a file page fault will be triggered. >>> >>> At this time, the original private file page may have been reclaimed. >>> If the page cache is not available at this time, a major fault will be >>> triggered and the file will be read, causing additional overhead. >>> >>> Fix this by rechecking the pte by holding ptl in filemap_fault() before >>> triggering a major fault. >>> >>> [1] https://lore.kernel.org/linux-mm/9e62fd9a-bee0-52bf-50a7-498fa17434ee@xxxxxxxxxx/ >>> >>> Signed-off-by: ZhangPeng <zhangpeng362@xxxxxxxxxx> >>> Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> >> Suggested-by: "Huang, Ying" <ying.huang@xxxxxxxxx> >> >> :-) >> >>> --- >>> mm/filemap.c | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 71f00539ac00..bb5e6a2790dc 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -3226,6 +3226,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) >>> mapping_locked = true; >>> } >>> } else { >>> + pte_t *ptep = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, >>> + vmf->address, &vmf->ptl); >>> + if (ptep) { >>> + /* >>> + * Recheck pte with ptl locked as the pte can be cleared >>> + * temporarily during a read/modify/write update. >>> + */ >>> + if (unlikely(!pte_none(ptep_get(ptep)))) >>> + ret = VM_FAULT_NOPAGE; >>> + pte_unmap_unlock(ptep, vmf->ptl); >>> + if (unlikely(ret)) >>> + return ret; >>> + } >>> + >> Need to deal with ptep == NULL. Although that is high impossible. > > Maybe we don't need to deal with ptep == NULL, because it has been > considered later in filemap_fault()? > ptep == NULL means that the ptep has been replaced with a PMD entry. > In this case, major fault is also required. I still think that we need to deal with that. That is common error processing logic. -- Best Regards, Huang, Ying >> >>> /* No page in the page cache at all */ >>> count_vm_event(PGMAJFAULT); >>> count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);