On Mon, Jan 07, 2019 at 11:25:16AM -0800, Dave Hansen wrote: > On 1/7/19 6:15 AM, Matthew Wilcox wrote: > > You're going to get data corruption doing this. try_to_unmap_one() > > does: > > > > /* Move the dirty bit to the page. Now the pte is gone. */ > > if (pte_dirty(pteval)) > > set_page_dirty(page); > > > > so PageDirty() can be false above, but made true by calling > > try_to_unmap(). > > I don't think that PageDirty() check is _required_ for correctness. You > can always safely try_to_unmap() no matter the state of the PTE. We > can't lock out the hardware from setting the Dirty bit, ever. > > It's also just fine to unmap PageDirty() pages, as long as when the PTE > is created, we move the dirty bit from the PTE into the 'struct page' > (which try_to_unmap() does, as you noticed). Right, but the very next thing the patch does is call invalidate_complete_page(), which calls __remove_mapping() which ... oh, re-checks PageDirty() and refuses to drop the page. So this isn't the data corruptor that I had thought it was. > > I also think the way you've done this is expedient at the cost of > > efficiency and layering violations. I think you should first tear > > down the mappings of userspace processes (which will reclaim a lot > > of pages allocated to page tables), then you won't need to touch the > > invalidate_inode_pages paths at all. > > By "tear down the mappings", do you mean something analogous to munmap() > where the VMA goes away? Or madvise(MADV_DONTNEED) where the PTE is > destroyed but the VMA remains? > > Last time I checked, we only did free_pgtables() when tearing down VMAs, > but not for pure unmappings like reclaim or MADV_DONTNEED. I've thought > it might be fun to make a shrinker that scanned page tables looking for > zero'd pages, but I've never run into a system where empty page table > pages were actually causing a noticeable problem. A few hours ago when I thought this patch had the ordering of checking PageDirty() the wrong way round, I had the madvise analogy in mind so that the PTEs would get destroyed and the dirty information transferred to the struct page first before trying to drop pages.