> On Apr 10, 2023, at 12:52 AM, Huang Ying <ying.huang@xxxxxxxxx> wrote: > > 0Day/LKP reported a performance regression for commit > 7e12beb8ca2a ("migrate_pages: batch flushing TLB"). In the commit, the > TLB flushing during page migration is batched. So, in > try_to_migrate_one(), ptep_clear_flush() is replaced with > set_tlb_ubc_flush_pending(). In further investigation, it is found > that the TLB flushing can be avoided in ptep_clear_flush() if the PTE > is inaccessible. In fact, we can optimize in similar way for the > batched TLB flushing too to improve the performance. > > So in this patch, we check pte_accessible() before > set_tlb_ubc_flush_pending() in try_to_unmap/migrate_one(). Tests show > that the benchmark score of the anon-cow-rand-mt test case of > vm-scalability test suite can improve up to 2.1% with the patch on a > Intel server machine. The TLB flushing IPI can reduce up to 44.3%. LGTM. I know it’s meaningless for x86 (but perhaps ARM would use this infra too): do we need smp_mb__after_atomic() after ptep_get_and_clear() and before pte_accessible()? In addition, if this goes into stable (based on the Fixes tag), consider breaking it into 2 patches, when only one would be backported.