On Tue, May 15, 2012 at 5:17 PM, Nick Piggin <npiggin@xxxxxxxxx> wrote: > On 15 May 2012 19:15, Nick Piggin <npiggin@xxxxxxxxx> wrote: >> So this should go to linux-arch... >> >> On 15 May 2012 18:55, Alex Shi <alex.shi@xxxxxxxxx> wrote: >>> Not every flush_tlb_mm execution moment is really need to evacuate all >>> TLB entries, like in munmap, just few 'invlpg' is better for whole >>> process performance, since it leaves most of TLB entries for later >>> accessing. > > Did you have microbenchmarks for this like your mprotect numbers, > by the way? Test munmap numbers and see how that looks. Also, Might be off topic, but I just spent few minutes to test out the difference between write CR3 vs. invlpg on a pretty old but still reliable P4 desktop with my simple hardware latency and bandwidth test tool I posted for RFC several weeks ago on LKML. Both __native_flush_tlb() and __native_flush_tlb_single(...) introduced roughly 1 ns latency to tsc sampling executed in stop_machine_context in two logical CPUs Just to fuel the discussion. :-) Cheers, /l -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html