On Tue, Jul 11, 2017 at 01:06:48PM -0700, Nadav Amit wrote: > > +/* > > + * Reclaim batches unmaps pages under the PTL but does not flush the TLB > > + * TLB prior to releasing the PTL. It's possible a parallel mprotect or > > + * munmap can race between reclaim unmapping the page and flushing the > > + * page. If this race occurs, it potentially allows access to data via > > + * a stale TLB entry. Tracking all mm's that have TLB batching pending > > + * would be expensive during reclaim so instead track whether TLB batching > > + * occured in the past and if so then do a full mm flush here. This will > > + * cost one additional flush per reclaim cycle paid by the first munmap or > > + * mprotect. This assumes it's called under the PTL to synchronise access > > + * to mm->tlb_flush_batched. > > + */ > > +void flush_tlb_batched_pending(struct mm_struct *mm) > > +{ > > + if (mm->tlb_flush_batched) { > > + flush_tlb_mm(mm); > > + mm->tlb_flush_batched = false; > > + } > > +} > > #else > > static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable) > > { > > I don???t know what is exactly the invariant that is kept, so it is hard for > me to figure out all sort of questions: > > Should pte_accessible return true if mm->tlb_flush_batch==true ? > It shouldn't be necessary. The contexts where we hit the path are uprobes: elevated page count so no parallel reclaim dax: PTEs are not mapping that would be reclaimed hugetlbfs: Not reclaimed ksm: holds page lock and elevates count so cannot race with reclaim cow: at the time of the flush, the page count is elevated so cannot race with reclaim page_mkclean: only concerned with marking existing ptes clean but in any case, the batching flushes the TLB before issueing any IO so there isn't space for a stable TLB entry to be used for something bad. > Does madvise_free_pte_range need to be modified as well? > Yes, I noticed that out shortly after sending the first version and commented upon it. > How will future code not break anything? > I can't really answer that without a crystal ball. Code dealing with page table updates would need to take some care if it can race with parallel reclaim. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>