On Tue, Jun 28, 2022 at 10:29:00AM -0400, Chris Mason wrote: > As Sterba points out later in the thread, btrfs cares more because of > stable page requirements to protect data during COW and to make sure the > crcs we write to disk are correct. I don't think this matters here. What the other file systems do is to simply not ever write a page that has the dirty bit set, but never had ->page_mkwrite called on it, which is the case that is getting fixed up here. I did a little research and this post from Jan describes the problem best: https://lore.kernel.org/linux-mm/20180103100430.GE4911@xxxxxxxxxxxxxx/ So the problem is that while get_user_pages takes a write fault and marks the page dirty, the page could have been claned just after that, and then receive a set_page/folio_dirty after that. The canonical example would be the direct I/O read completion calling into that. > I'd love a proper fix for this on the *_user_pages() side where > page_mkwrite() style notifications are used all the time. It's just a huge > change, and my answer so far has always been that using btrfs mmap'd memory > for this kind of thing isn't a great choice either way. Everyone else has the same problem, but decided that you can't get full data integrity out of this workload. I think the sane answers are: simply don't writeback pages that are held by a get_user_pages with writable pages, or try to dirty the pages from set_page_dirtẏ. The set_page_dirty contexts are somewhat iffy, but would probably be a better place to kick off the btrfs writepage fixup.