On Tue, Apr 22, 2014 at 12:54 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > So PAGE_FLAGS_CHECK_AT_FREE doesn't include PG_dirty, so while we now > properly mark the page dirty, we could continue and simply free the > thing? Yes. But being free'd while dirty should be fairly normal for anonymous pages, no? And while I did a "pte_mkclean()" the the PageAnon() case (so that we won't waste time on "set_page_dirty()" on pages we don't care about, a concurrent truncate() could have turned what *used* to be a file-backed page into just a dirty page with no mapping any more. So I don't think we would necessarily want to check for PG_dirty at page freeing time, because freeing dirty pages isn't necessarily wrong. For example, tmpfs/shmfs pages are generally always dirty, and would get freed when the inode is deleted. That said, Dave Hansen did report a BUG_ON() in mpage_prepare_extent_to_map(). His line number was odd, but I assume it's this one: BUG_ON(PageWriteback(page)); which may be indicative of some oddity here wrt the dirty bit. So I'm a bit worried. I'm starting to think that we need to do "set_page_dirty_lock()". It *used* to be the case that because we held the page table lock and the page used to be mapped (even if we then unmapped it), page_mapping() could not go away from under us because truncate would see it in the rmap list and then get serialized on that page table lock. But moving the set_page_dirty() later - and to outside the page table lock - means that we probably need to use that version that takes the page lock. Which might kind of suck from a performance angle. But it might explain DaveH's BUG_ON() when testing those patches? I wonder if we could hold on to the page mapping some other way than taking that page lock, because taking that page lock sounds potentially damn expensive. Who is the master of the lock_page() semantics? Hugh Dickins again? I'm bringing him in for this issue too, since whenever there is some vm locking issue, he is always on top of it. Hugh - I'm assuming you are on linux-mm. If not, holler, and I'll send you the two patches I wrote for the TLB dirty shootdown (short explanation: dirty bit setting needs to be delayed until after tlb flushing, since other CPU's may be busily writing to the page until that time). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html