On Thu, Jul 16, 2020 at 05:36:30AM +0800, Yang Shi wrote: > Recently we found regression when running will_it_scale/page_fault3 test > on ARM64. Over 70% down for the multi processes cases and over 20% down > for the multi threads cases. It turns out the regression is caused by commit > 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before > calling balance_dirty_pages() in write fault"). > > The test mmaps a memory size file then write to the mapping, this would > make all memory dirty and trigger dirty pages throttle, that upstream > commit would release mmap_sem then retry the page fault. The retried > page fault would see correct PTEs installed by the first try then update > dirty bit and clear read-only bit and flush TLBs for ARM. The regression is > caused by the excessive TLB flush. It is fine on x86 since x86 doesn't > clear read-only bit so there is no need to flush TLB for this case. > > The page fault would be retried due to: > 1. Waiting for page readahead > 2. Waiting for page swapped in > 3. Waiting for dirty pages throttling > > The first two cases don't have PTEs set up at all, so the retried page > fault would install the PTEs, so they don't reach there. But the #3 > case usually has PTEs installed, the retried page fault would reach the > dirty bit and read-only bit update. But it seems not necessary to > modify those bits again for #3 since they should be already set by the > first page fault try. > > Of course the parallel page fault may set up PTEs, but we just need care > about write fault. If the parallel page fault setup a writable and dirty > PTE then the retried fault doesn't need do anything extra. If the > parallel page fault setup a clean read-only PTE, the retried fault should > just call do_wp_page() then return as the below code snippet shows: > > if (vmf->flags & FAULT_FLAG_WRITE) { > if (!pte_write(entry)) > return do_wp_page(vmf); > } > > With this fix the test result get back to normal. > > Fixes: 89b15332af7c ("mm: drop mmap_sem before calling balance_dirty_pages() in write fault") > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Will Deacon <will.deacon@xxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Reported-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> > Debugged-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> > Tested-by: Xu Yu <xuyu@xxxxxxxxxxxxxxxxx> > Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> > --- > v2: * Incorporated the comment from Will Deacon. > * Updated the commit log per the discussion. > > mm/memory.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 87ec87c..e93e1da 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > if (vmf->flags & FAULT_FLAG_WRITE) { > if (!pte_write(entry)) > return do_wp_page(vmf); > - entry = pte_mkdirty(entry); > } > + > + if (vmf->flags & FAULT_FLAG_TRIED) > + goto unlock; > + > + if (vmf->flags & FAULT_FLAG_WRITE) > + entry = pte_mkdirty(entry); > + Thanks, this looks better to me. Andrew -- please can you update the version in your tree? Cheers, Will