On Sat, Oct 29, 2022 at 12:39 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > ext4 has since papered over the problem, by soldiering on if it finds a > page without writeback buffers when it expected to be able to writeback > a dirty page. But you get the idea. I suspect that "soldiering on" is the right thing to do, but yes, our 'mkdirty' vs 'mkclean' thing has always been problematic. I think we always needed a page lock for it, but PG_lock itself doesn't work (as mentioned earlier) because the VM can't serialize with IO, and needs the lock to basically be a spinlock. The page table lock kind of took its place, and then the rmap removal makes for problems (since it is what basically ends up being the shared place to look it upo). I can think of three options: (a) filesystems just deal with it (b) we could move the "page_remove_rmap()" into the "flush-and-free" path too (c) we could actually add a spinlock (hashed on the page?) for this I think (a) is basically our current expectation. And (b) would be fairly easy - same model as that dirty bit patch, just a 'do page_remove_rmap too' - except page_remove_rmap() wants the vma as well (and we delay the TLB flush over multiple vma's, so it's not just a "save vma in mmu_gather"). Doing (c) doesn't look hard, except for the "new lock" thing, which is always a potential huge disaster. If it's only across set_page_dirty() and page_mkclean(), though, and uses some simple page-based hash, it sounds fairly benign. Linus Linus