Re: [PATCH] btrfs: remove btrfs_writepage_cow_fixup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2022/6/28 16:00, Jan Kara wrote:
On Tue 28-06-22 08:24:07, Qu Wenruo wrote:
On 2022/6/27 18:19, Jan Kara wrote:
On Sat 25-06-22 11:11:43, Christoph Hellwig wrote:
On Fri, Jun 24, 2022 at 03:07:50PM +0200, Jan Kara wrote:
I'm not sure I get the context 100% right but pages getting randomly dirty
behind filesystem's back can still happen - most commonly with RDMA and
similar stuff which calls set_page_dirty() on pages it has got from
pin_user_pages() once the transfer is done. page_maybe_dma_pinned() should
be usable within filesystems to detect such cases and protect the
filesystem but so far neither me nor John Hubbart has got to implement this
in the generic writeback infrastructure + some filesystem as a sample case
others could copy...

Well, so far the strategy elsewhere seems to be to just ignore pages
only dirtied through get_user_pages.  E.g. iomap skips over pages
reported as holes, and ext4_writepage complains about pages without
buffers and then clears the dirty bit and continues.

I'm kinda surprised that btrfs wants to treat this so special
especially as more of the btrfs page and sub-page status will be out
of date as well.

I agree btrfs probably needs a different solution than what it is currently
doing if they want to get things right. I just wanted to make it clear that
the code you are ripping out may be a wrong solution but to a real problem.

IHMO I believe btrfs should also ignore such dirty but not managed by fs
pages.

But I still have a small concern here.

Is it ensured that, after RDMA dirtying the pages, would we finally got
a proper notification to fs that those pages are marked written?

So there is ->page_mkwrite() notification happening when RDMA code calls
pin_user_pages() when preparing buffers.

I'm wondering why page_mkwrite() is only called when preparing the buffer?

Wouldn't it make more sense to call page_mkwrite() when the buffered is
released from RDMA?

Sorry for all these dumb questions, as the core-api/pin_user_pages.rst
still doesn't explain thing to my dumb brain...



Another thing is, RDMA doesn't really need to respect things like page
locked/writeback, right?
As to RDMA calls, all pages should be pinned and seemingly exclusive to
them.

And in that case, I think btrfs should ignore writing back those pages,
other than doing fixing ups.

As the btrfs csum requires everyone modifying the page to wait for
writeback, or the written data will be out-of-sync with the calculated
csum and cause future -EIO when reading it from disk.


The trouble is that although later
page_mkclean() makes page not writeable from page tables, it may be still
written by RDMA code (even hours after ->page_mkwrite() notification, RDMA
buffers are really long-lived) and that's what eventually confuses the
filesystem.  Otherwise set_page_dirty() is the notification that page
contents was changed and needs writing out...

Another thing I still didn't get is, is there any explicit
mkwrite()/set_page_dirty() calls when those page are unpinned.

If no such explicit calls, these dirty pages caused by RDMA would always
be ignored by fses (except btrfs), and would never got proper written back.

Thanks,
Qu


								Honza




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux