On Tue, 19 May 2020 18:03:28 +0800 Bibo Mao <maobibo@xxxxxxxxxxx> wrote: > If two threads concurrently fault at the same address, the thread that > won the race updates the PTE and its local TLB. For now, the other > thread gives up, simply does nothing, and continues. > > It could happen that this second thread triggers another fault, whereby > it only updates its local TLB while handling the fault. Instead of > triggering another fault, let's directly update the local TLB of the > second thread. > > It is only useful to architectures where software can update TLB, it may > bring out some negative effect if update_mmu_cache is used for other > purpose also. It seldom happens where multiple threads access the same > page at the same time, so the negative effect is limited on other arches. I'm still worried about the impact on other architectures. The additional update_mmu_cache() calls won't occur only when multiple threads are racing against the same page, I think? For example, insert_pfn() will do this when making a read-only page a writable one. Would you have time to add some instrumentation into update_mmu_cache() (maybe a tracepoint) and see what effect this change has upon the frequency at which update_mmu_cache() is called for a selection of workloads? And add this info to the changelog to set minds at ease?