Re: [PATCH] btrfs: remove btrfs_writepage_cow_fixup

Christoph Hellwig <hch@xxxxxx> · Wed, 29 Jun 2022 10:38:36 +0200

On Tue, Jun 28, 2022 at 10:29:00AM -0400, Chris Mason wrote:
> As Sterba points out later in the thread, btrfs cares more because of 
> stable page requirements to protect data during COW and to make sure the 
> crcs we write to disk are correct.

I don't think this matters here.  What the other file systems do is to
simply not ever write a page that has the dirty bit set, but never had
->page_mkwrite called on it, which is the case that is getting fixed up
here.

I did a little research and this post from Jan describes the problem
best:

https://lore.kernel.org/linux-mm/20180103100430.GE4911@xxxxxxxxxxxxxx/

So the problem is that while get_user_pages takes a write fault and
marks the page dirty, the page could have been claned just after
that, and then receive a set_page/folio_dirty after that.  The
canonical example would be the direct I/O read completion calling
into that.

> I'd love a proper fix for this on the *_user_pages() side where 
> page_mkwrite() style notifications are used all the time.  It's just a huge 
> change, and my answer so far has always been that using btrfs mmap'd memory 
> for this kind of thing isn't a great choice either way.

Everyone else has the same problem, but decided that you can't get
full data integrity out of this workload.

I think the sane answers are:  simply don't writeback pages that
are held by a get_user_pages with writable pages, or try to dirty
the pages from set_page_dirtẏ.  The set_page_dirty contexts are
somewhat iffy, but would probably be a better place to kick off the
btrfs writepage fixup.