On 05/20/2020 09:26 AM, Andrew Morton wrote: > On Tue, 19 May 2020 18:03:28 +0800 Bibo Mao <maobibo@xxxxxxxxxxx> wrote: > >> If two threads concurrently fault at the same address, the thread that >> won the race updates the PTE and its local TLB. For now, the other >> thread gives up, simply does nothing, and continues. >> >> It could happen that this second thread triggers another fault, whereby >> it only updates its local TLB while handling the fault. Instead of >> triggering another fault, let's directly update the local TLB of the >> second thread. >> >> It is only useful to architectures where software can update TLB, it may >> bring out some negative effect if update_mmu_cache is used for other >> purpose also. It seldom happens where multiple threads access the same >> page at the same time, so the negative effect is limited on other arches. > > I'm still worried about the impact on other architectures. The > additional update_mmu_cache() calls won't occur only when multiple > threads are racing against the same page, I think? For example, > insert_pfn() will do this when making a read-only page a writable one. How about defining ptep_set_access_flags function like this on mips system? which is the same on riscv platform. static inline int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty) { if (!pte_same(*ptep, entry)) set_pte_at(vma->vm_mm, address, ptep, entry); /* * update_mmu_cache will unconditionally execute, handling both * the case that the PTE changed and the spurious fault case. */ return true; } And keep the following piece of code unchanged, the change will be smaller. @@ -1770,8 +1770,8 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, } entry = pte_mkyoung(*pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, addr, pte, entry, 1)) - update_mmu_cache(vma, addr, pte); + ptep_set_access_flags(vma, addr, pte, entry, 1); + update_mmu_cache(vma, addr, pte); } @@ -2436,17 +2436,16 @@ static inline bool cow_user_page(struct page *dst, struct page *src, entry = pte_mkyoung(vmf->orig_pte); - if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0)) - update_mmu_cache(vma, addr, vmf->pte); + ptep_set_access_flags(vma, addr, vmf->pte, entry, 0); + update_mmu_cache(vma, addr, vmf->pte); } @@ -2618,8 +2618,8 @@ static inline void wp_page_reuse(struct vm_fault *vmf) flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = pte_mkyoung(vmf->orig_pte); entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) - update_mmu_cache(vma, vmf->address, vmf->pte); + ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1); + update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); } > > Would you have time to add some instrumentation into update_mmu_cache() > (maybe a tracepoint) and see what effect this change has upon the > frequency at which update_mmu_cache() is called for a selection of > workloads? And add this info to the changelog to set minds at ease? OK, I will add some instrumentation data in the changelog.