Re: [PATCH v5 6/7] x86/tlb: optimizing flush_tlb_mm

Luming Yu <luming.yu@xxxxxxxxx> · Tue, 15 May 2012 21:08:01 +0800

On Tue, May 15, 2012 at 8:58 PM, Luming Yu <luming.yu@xxxxxxxxx> wrote:
> On Tue, May 15, 2012 at 5:17 PM, Nick Piggin <npiggin@xxxxxxxxx> wrote:
>> On 15 May 2012 19:15, Nick Piggin <npiggin@xxxxxxxxx> wrote:
>>> So this should go to linux-arch...
>>>
>>> On 15 May 2012 18:55, Alex Shi <alex.shi@xxxxxxxxx> wrote:
>>>> Not every flush_tlb_mm execution moment is really need to evacuate all
>>>> TLB entries, like in munmap, just few 'invlpg' is better for whole
>>>> process performance, since it leaves most of TLB entries for later
>>>> accessing.
>>
>> Did you have microbenchmarks for this like your mprotect numbers,
>> by the way? Test munmap numbers and see how that looks. Also,
>
> Might be off topic, but I just spent few minutes to test out the difference
> between write CR3 vs. invlpg on a pretty old but still reliable P4 desktop
> with my simple hardware latency and bandwidth test tool I posted for
> RFC several weeks ago on LKML.
>
> Both __native_flush_tlb() and __native_flush_tlb_single(...)
> introduced roughly 1 ns  latency to tsc sampling executed in

sorry, typo, 1us.. but I should capture nanosecond data. :-(

> stop_machine_context in two logical CPUs
>
> Just to fuel the discussion. :-)
>
> Cheers,
> /l
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html