On Tue 28-06-22 08:24:07, Qu Wenruo wrote: > On 2022/6/27 18:19, Jan Kara wrote: > > On Sat 25-06-22 11:11:43, Christoph Hellwig wrote: > > > On Fri, Jun 24, 2022 at 03:07:50PM +0200, Jan Kara wrote: > > > > I'm not sure I get the context 100% right but pages getting randomly dirty > > > > behind filesystem's back can still happen - most commonly with RDMA and > > > > similar stuff which calls set_page_dirty() on pages it has got from > > > > pin_user_pages() once the transfer is done. page_maybe_dma_pinned() should > > > > be usable within filesystems to detect such cases and protect the > > > > filesystem but so far neither me nor John Hubbart has got to implement this > > > > in the generic writeback infrastructure + some filesystem as a sample case > > > > others could copy... > > > > > > Well, so far the strategy elsewhere seems to be to just ignore pages > > > only dirtied through get_user_pages. E.g. iomap skips over pages > > > reported as holes, and ext4_writepage complains about pages without > > > buffers and then clears the dirty bit and continues. > > > > > > I'm kinda surprised that btrfs wants to treat this so special > > > especially as more of the btrfs page and sub-page status will be out > > > of date as well. > > > > I agree btrfs probably needs a different solution than what it is currently > > doing if they want to get things right. I just wanted to make it clear that > > the code you are ripping out may be a wrong solution but to a real problem. > > IHMO I believe btrfs should also ignore such dirty but not managed by fs > pages. > > But I still have a small concern here. > > Is it ensured that, after RDMA dirtying the pages, would we finally got > a proper notification to fs that those pages are marked written? So there is ->page_mkwrite() notification happening when RDMA code calls pin_user_pages() when preparing buffers. The trouble is that although later page_mkclean() makes page not writeable from page tables, it may be still written by RDMA code (even hours after ->page_mkwrite() notification, RDMA buffers are really long-lived) and that's what eventually confuses the filesystem. Otherwise set_page_dirty() is the notification that page contents was changed and needs writing out... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR