On 2022/6/28 22:29, Chris Mason wrote:
On 6/25/22 5:11 AM, Christoph Hellwig wrote:
On Fri, Jun 24, 2022 at 03:07:50PM +0200, Jan Kara wrote:
I'm not sure I get the context 100% right but pages getting randomly
dirty
behind filesystem's back can still happen - most commonly with RDMA and
similar stuff which calls set_page_dirty() on pages it has got from
pin_user_pages() once the transfer is done. page_maybe_dma_pinned()
should
be usable within filesystems to detect such cases and protect the
filesystem but so far neither me nor John Hubbart has got to
implement this
in the generic writeback infrastructure + some filesystem as a sample
case
others could copy...
Well, so far the strategy elsewhere seems to be to just ignore pages
only dirtied through get_user_pages. E.g. iomap skips over pages
reported as holes, and ext4_writepage complains about pages without
buffers and then clears the dirty bit and continues.
I'm kinda surprised that btrfs wants to treat this so special
especially as more of the btrfs page and sub-page status will be out
of date as well.
As Sterba points out later in the thread, btrfs cares more because of
stable page requirements to protect data during COW and to make sure the
crcs we write to disk are correct.
In fact, COW is not that special, even before btrfs or all the other
fses supporting COW, all those old fses has to do something like COW,
when they are writing into holes.
What makes btrfs special is its csum, and in fact csum requires more
stable page status.
If someone can modify a page without waiting for its writeback to
finish, btrfs csum can easily be stale and cause -EIO for future read.
Thus unless we can ensure the procedure marking page dirty to respect
page writeback, going fixup path would be more dangerous than ignoring it.
The fixup worker path is pretty easy to trigger if you O_DIRECT reads
into mmap'd pages. You need some memory pressure to power through
get_user_pages trying to do the right thing, but it does happen.
I'd love a proper fix for this on the *_user_pages() side where
page_mkwrite() style notifications are used all the time. It's just a
huge change, and my answer so far has always been that using btrfs
mmap'd memory for this kind of thing isn't a great choice either way.
The same here.
But for now I'm still wondering if the fixup is really the correct
workaround other than ignoring.
Thanks,
Qu
-chris