Re: [PATCH] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 9 Oct 2018 09:16:37 +0200

On Tue, Oct 09, 2018 at 10:02:50AM +0530, Ashish Mhetre wrote:
> From: Shaohua Li <shli@xxxxxxxxxx>
> 
> We use the accessed bit to age a page at page reclaim time,
> and currently we also flush the TLB when doing so.
> 
> But in some workloads TLB flush overhead is very heavy. In my
> simple multithreaded app with a lot of swap to several pcie
> SSDs, removing the tlb flush gives about 20% ~ 30% swapout
> speedup.
> 
> Fortunately just removing the TLB flush is a valid optimization:
> on x86 CPUs, clearing the accessed bit without a TLB flush
> doesn't cause data corruption.
> 
> It could cause incorrect page aging and the (mistaken) reclaim of
> hot pages, but the chance of that should be relatively low.
> 
> So as a performance optimization don't flush the TLB when
> clearing the accessed bit, it will eventually be flushed by
> a context switch or a VM operation anyway. [ In the rare
> event of it not getting flushed for a long time the delay
> shouldn't really matter because there's no real memory
> pressure for swapout to react to. ]

Note that context switches (and here I'm talking about switch_mm(), not
the cheaper switch_to()) do not unconditionally imply a TLB invalidation
these days (on PCID enabled hardware).

So in that regards, the Changelog (and the comment) is a little
misleading.

I don't see anything fundamentally wrong with the patch though; just the
wording.