On Sat, Apr 26, 2014 at 08:07:11PM +0200, Peter Zijlstra wrote: > > > I think we could look at mapping_cap_account_dirty(page->mapping) while > > > holding the ptelock, the mapping can't go away while we hold that lock. > > > > > > And afaict that's the exact differentiator between these two cases. > > > > Yes, that's easily done, but I wasn't sure whether it was correct to > > skip on shmem or not - just because shmem doesn't participate in the > > page_mkclean() protocol, doesn't imply it's free from similar bugs. > > > > I haven't seen a precise description of the bug we're anxious to fix: > > Dave's MADV_DONTNEED should be easily fixable, that's not a concern; > > Linus's first patch wrote of writing racing with cleaning, but didn't > > give a concrete example. > > The way I understand it is that we observe the PTE dirty and set PAGE > dirty before we make the PTE globally unavailable (through a TLB flush), > and thereby we can mistakenly loose updates; by thinking a page is in > fact clean even though we can still get updates. > > But I suspect you got that far.. OK, so I've been thinking and figured I either mis-understand how the hardware works or don't understand how Linus' patch will actually fully fix the issue. So what both try_to_unmap_one() and zap_pte_range() end up doing is clearing the PTE entry and then flushing the TLBs. However, that still leaves a window where there are remote TLB entries. What if any of those remote entries cause a write (or have a dirty bit cached) while we've already removed the PTE entry. This means that the remote CPU cannot update the PTE anymore (its not there after all). Will the hardware fault when it does a translation and needs to update the dirty/access bits while the PTE entry is !present? -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html