Re: Potential race in TLB flush batching?

Mel Gorman <mgorman@xxxxxxx> · Thu, 20 Jul 2017 08:43:42 +0100

On Wed, Jul 19, 2017 at 04:39:07PM -0700, Nadav Amit wrote:
> > If try_to_unmap returns false on CPU0 then at least one unmap attempt
> > failed and the page is not reclaimed.
> 
> Actually, try_to_unmap() may even return true, and the page would still not
> be reclaimed - for example if page_has_private() and freeing the buffers
> fails. In this case, the page would be unlocked as well.
> 

I'm not seeing the relevance from the perspective of a stale TLB being
used to corrupt memory or access the wrong data.

> > For those that were unmapped, they
> > will get flushed in the near future. When KSM operates on CPU1, it'll skip
> > the unmapped pages under the PTL so stale TLB entries are not relevant as
> > the mapped entries are still pointing to a valid page and ksm misses a merge
> > opportunity.
> 
> This is the case I regarded, but I do not understand your point. The whole
> problem is that CPU1 would skip the unmapped pages under the PTL. As it
> skips them it does not flush them from the TLB. And as a result,
> replace_page() may happen before the TLB is flushed by CPU0.
> 

At the time of the unlock_page on the reclaim side, any unmapping that
will happen before the flush has taken place. If KSM starts between the
unlock_page and the tlb flush then it'll skip any of the PTEs that were
previously unmapped with stale entries so there is no relevant stale TLB
entry to work with.

> > If it write protects a page, ksm unconditionally flushes the PTE
> > on clearing the PTE so again, there is no stale entry anywhere. For CPU2,
> > it'll either reference a PTE that was unmapped in which case it'll fault
> > once CPU0 flushes the TLB and until then it's safe to read and write as
> > long as the TLB is flushed before the page is freed or IO is initiated which
> > reclaim already handles.
> 
> In my scenario the page is not freed and there is no I/O in the reclaim
> path. The TLB flush of CPU0 in my scenario is just deferred while the
> page-table lock is not held. As I mentioned before, this time-period can be
> potentially very long in a virtual machine. CPU2 referenced a PTE that
> was unmapped by CPU0 (reclaim path) but not CPU1 (ksm path).
> 
> ksm, IIUC, would not expect modifications of the page during replace_page.

Indeed not but it'll either find not PTE in which case it won't allow a
stale PTE entry to exist and even when it finds a PTE, it flushes the
TLB unconditionally to avoid any writes taking place. It holds the page
lock while setting up the sharing so no parallel fault can reinsert the
page and no parallel writes can take place that would result in false
sharing.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>