Re: [PATCH] iomap: don't invalidate folios after writeback errors

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Mon, 16 May 2022 13:27:20 +0100



On Sun, May 15, 2022 at 08:37:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> XFS has the unique behavior (as compared to the other Linux filesystems)
> that on writeback errors it will completely invalidate the affected
> folio and force the page cache to reread the contents from disk.  All
> other filesystems leave the page mapped and up to date.
> 
> This is a rude awakening for user programs, since (in the case where
> write fails but reread doesn't) file contents will appear to revert to
> old disk contents with no notification other than an EIO on fsync.  This
> might have been annoying back in the days when iomap dealt with one page
> at a time, but with multipage folios, we can now throw away *megabytes*
> worth of data for a single write error.
> 
> On *most* Linux filesystems, a program can respond to an EIO on write by
> redirtying the entire file and scheduling it for writeback.  This isn't
> foolproof, since the page that failed writeback is no longer dirty and
> could be evicted, but programs that want to recover properly *also*
> have to detect XFS and regenerate every write they've made to the file.
> 
> When running xfs/314 on arm64, I noticed a UAF bug when xfs_discard_folio
> invalidates multipage folios that could be undergoing writeback.  If,
> say, we have a 256K folio caching a mix of written and unwritten
> extents, it's possible that we could start writeback of the first (say)
> 64K of the folio and then hit a writeback error on the next 64K.  We
> then free the iop attached to the folio, which is really bad because
> writeback completion on the first 64k will trip over the "blocks per
> folio > 1 && !iop" assertion.
> 
> This can't be fixed by only invalidating the folio if writeback fails at
> the start of the folio, since the folio is marked !uptodate, which trips
> other assertions elsewhere.  Get rid of the whole behavior entirely.
> 
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>

Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>