"Yin, Fengwei" <fengwei.yin@xxxxxxxxx> writes: > On 11/10/2023 1:32 PM, Huang, Ying wrote: >> Matthew Wilcox <willy@xxxxxxxxxxxxx> writes: >> >>> On Thu, Nov 09, 2023 at 09:47:24PM +0800, zhangpeng (AS) wrote: >>>> There is a stage in numa fault which will set pte as 0 in do_numa_page() : >>>> ptep_modify_prot_start() will clear the vmf->pte, until >>>> ptep_modify_prot_commit() assign a value to the vmf->pte. >>> >>> [...] >>> >>>> Our problem scenario is as follows: >>>> >>>> task 1 task 2 >>>> ------ ------ >>>> /* scan global variables */ >>>> do_numa_page() >>>> spin_lock(vmf->ptl) >>>> ptep_modify_prot_start() >>>> /* set vmf->pte as null */ >>>> /* Access global variables */ >>>> handle_pte_fault() >>>> /* no pte lock */ >>>> do_pte_missing() >>>> do_fault() >>>> do_read_fault() >>>> ptep_modify_prot_commit() >>>> /* ptep update done */ >>>> pte_unmap_unlock(vmf->pte, vmf->ptl) >>>> do_fault_around() >>>> __do_fault() >>>> filemap_fault() >>>> /* page cache is not available >>>> and a major fault is triggered */ >>>> do_sync_mmap_readahead() >>>> /* page_not_uptodate and goto >>>> out_retry. */ >>>> >>>> Is there any way to avoid such a major fault? >>> >>> Yes, this looks like a bug. >>> >>> It seems to me that the easiest way to fix this is not to zero the pte >>> but to make it protnone? That would send task 2 into do_numa_page() >>> where it would take the ptl, then check pte_same(), see that it's >>> changed and goto out, which will end up retrying the fault. >> >> There are other places in the kernel where the PTE is cleared, for >> example, move_ptes() in mremap.c. IIUC, we need to audit all them. >> >> Another possible solution is to check PTE again with PTL held before >> reading in file data. This will increase the overhead of major fault >> path. Is it acceptable? > What if we check the PTE without page table lock acquired? The PTE is zeroed temporarily only with PTL held. So, if we acquire the PTL in filemap_fault() and check the PTE, the PTE which is zeroed in do_numa_page() will be non-zero now. So we can avoid the major fault. But, if we don't acquire the PTL, the PTE may still be zero. -- Best Regards, Huang, Ying > Regards > Yin, Fengwei > >> >>> I'm not particularly expert at page table manipulation, so I'll let >>> somebody who is propose an actual patch. Or you could try to do it? >> >> -- >> Best Regards, >> Huang, Ying