Mel Gorman <mgorman@xxxxxxx> wrote: > On Tue, Jul 11, 2017 at 09:09:23PM +0100, Mel Gorman wrote: >> On Tue, Jul 11, 2017 at 08:18:23PM +0100, Mel Gorman wrote: >>> I don't think we should be particularly clever about this and instead just >>> flush the full mm if there is a risk of a parallel batching of flushing is >>> in progress resulting in a stale TLB entry being used. I think tracking mms >>> that are currently batching would end up being costly in terms of memory, >>> fairly complex, or both. Something like this? >> >> mremap and madvise(DONTNEED) would also need to flush. Memory policies are >> fine as a move_pages call that hits the race will simply fail to migrate >> a page that is being freed and once migration starts, it'll be flushed so >> a stale access has no further risk. copy_page_range should also be ok as >> the old mm is flushed and the new mm cannot have entries yet. > > Adding those results in You are way too fast for me. > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -637,12 +637,34 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) > return false; > > /* If remote CPUs need to be flushed then defer batch the flush */ > - if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) > + if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) { > should_defer = true; > + mm->tlb_flush_batched = true; > + } Since mm->tlb_flush_batched is set before the PTE is actually cleared, it still seems to leave a short window for a race. CPU0 CPU1 ---- ---- should_defer_flush => mm->tlb_flush_batched=true flush_tlb_batched_pending (another PT) => flush TLB => mm->tlb_flush_batched=false ptep_get_and_clear ... flush_tlb_batched_pending (batched PT) use the stale PTE ... try_to_unmap_flush IOW it seems that mm->flush_flush_batched should be set after the PTE is cleared (and have some compiler barrier to be on the safe side). Just to clarify - I don’t try to annoy, but I considered building and submitting a patch based on some artifacts of a study I conducted, and this issue drove me crazy. One more question, please: how does elevated page count or even locking the page help (as you mention in regard to uprobes and ksm)? Yes, the page will not be reclaimed, but IIUC try_to_unmap is called before the reference count is frozen, and the page lock is dropped on each iteration of the loop in shrink_page_list. In this case, it seems to me that uprobes or ksm may still not flush the TLB. Thanks, Nadav -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href