On Sat, Oct 29, 2022 at 7:17 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > Running the PoC on Linux 6.0.6 with these patches caused the following splat > on the following line: > > WARN_ON_ONCE(!folio_test_locked(folio) && !folio_test_dirty(folio)); Yeah, this is a sign of that "folio_mkclean() serializes with folio_mark_dirty using rmap and the page table lock". And page_remove_rmap() could *almost* be called later, but it does have code that also depends on the page table lock, although it looks like realistically that's just because it "knows" that means that preemption is disabled, so it uses non-atomic statistics update. I say "knows" in quotes, because that's what the comment says, but it turns out that __mod_node_page_state() has to deal with CONFIG_RT anyway and does that preempt_disable_nested(); ... preempt_enable_nested(); thing. And then it wants to see the vma, although that's actually only to see if it's 'mlock'ed, so we could just squirrel that away. So we *could* move page_remove_rmap() later into the TLB flush region, but then we would have lost the page table lock anyway, so then folio_mkclean() can come in regardless. So that doesn't even help. End result: we do want to do the page_set_dirty() and the remove_rmap() under the paeg table lock, because it's what serializes folio_mkclean(). And we'd _like_ to do the TLB flush before the remove_rmap(), but we *really* don't want to do that for every page. So my current gut feel is that we should just say that if you do "MADV_DONTNEED or do a munmap() (which includes the "re-mmap() over the area", while some other thread is still writing to that memory region, you may lose writes. IOW, just accept the behavior that Nadav's test-program tries to show, and say "look, you're doing insane things, we've never given you any other semantics, it's your problem" to any user program that does that. If a user program does MADV_DONTNEED on an area that it is actively using at the same time in another thread, that sounds really really bogus. Same goes doubly for 'munmap()' or over-mapping. Linus