On 1/7/19 6:15 AM, Matthew Wilcox wrote: > You're going to get data corruption doing this. try_to_unmap_one() > does: > > /* Move the dirty bit to the page. Now the pte is gone. */ > if (pte_dirty(pteval)) > set_page_dirty(page); > > so PageDirty() can be false above, but made true by calling > try_to_unmap(). I don't think that PageDirty() check is _required_ for correctness. You can always safely try_to_unmap() no matter the state of the PTE. We can't lock out the hardware from setting the Dirty bit, ever. It's also just fine to unmap PageDirty() pages, as long as when the PTE is created, we move the dirty bit from the PTE into the 'struct page' (which try_to_unmap() does, as you noticed). > I also think the way you've done this is expedient at the cost of > efficiency and layering violations. I think you should first tear > down the mappings of userspace processes (which will reclaim a lot > of pages allocated to page tables), then you won't need to touch the > invalidate_inode_pages paths at all. By "tear down the mappings", do you mean something analogous to munmap() where the VMA goes away? Or madvise(MADV_DONTNEED) where the PTE is destroyed but the VMA remains? Last time I checked, we only did free_pgtables() when tearing down VMAs, but not for pure unmappings like reclaim or MADV_DONTNEED. I've thought it might be fun to make a shrinker that scanned page tables looking for zero'd pages, but I've never run into a system where empty page table pages were actually causing a noticeable problem.