Re: [PATCH] xfs: redirty eof folio on truncate to avoid filemap flush

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 28, 2022 at 04:49:37PM -0700, Darrick J. Wong wrote:
> On Sat, Oct 29, 2022 at 08:30:14AM +1100, Dave Chinner wrote:
> > On Fri, Oct 28, 2022 at 02:26:47PM -0400, Brian Foster wrote:
> > > On Fri, Oct 28, 2022 at 09:11:09AM -0400, Brian Foster wrote:
> > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > ---
> > > > 
> > > > Here's a quick prototype of "option 3" described in my previous mail.
> > > > This has been spot tested and confirmed to prevent the original stale
> > > > data exposure problem. More thorough regression testing is still
> > > > required. Barring unforeseen issues with that, however, I think this is
> > > > tentatively my new preferred option. The primary reason for that is it
> > > > avoids looking at extent state and is more in line with what iomap based
> > > > zeroing should be doing more generically.
> > > > 
> > > > Because of that, I think this provides a bit more opportunity for follow
> > > > on fixes (there are other truncate/zeroing problems I've come across
> > > > during this investigation that still need fixing), cleanup and
> > > > consolidation of the zeroing code. For example, I think the trajectory
> > > > of this could look something like:
> > > > 
> > > > - Genericize a bit more to handle all truncates.
> > > > - Repurpose iomap_truncate_page() (currently only used by XFS) into a
> > > >   unique implementation from zero range that does explicit zeroing
> > > >   instead of relying on pagecache truncate.
> > > > - Refactor XFS ranged zeroing to an abstraction that uses a combination
> > > >   of iomap_zero_range() and the new iomap_truncate_page().
> > > > 
> > > 
> > > After playing with this and thinking a bit more about the above, I think
> > > I managed to come up with an iomap_truncate_page() prototype that DTRT
> > > based on this. Only spot tested so far, needs to pass iomap_flags to the
> > > other bmbt_to_iomap() calls to handle the cow fork, undoubtedly has
> > > other bugs/warts, etc. etc. This is just a quick prototype to
> > > demonstrate the idea, which is essentially to check dirty state along
> > > with extent state while under lock and transfer that state back to iomap
> > > so it can decide whether it can shortcut or forcibly perform the zero.
> > > 
> > > In a nutshell, IOMAP_TRUNC_PAGE asks the fs to check dirty state while
> > > under lock and implies that the range is sub-block (single page).
> > > IOMAP_F_TRUNC_PAGE on the imap informs iomap that the range was in fact
> > > dirty, so perform the zero via buffered write regardless of extent
> > > state.
> > 
> > I'd much prefer we fix this in the iomap infrastructure - failing to
> > zero dirty data in memory over an unwritten extent isn't an XFS bug,
> > so we shouldn't be working around it in XFS like we did previously.
> 
> Hmm, I think I agree, given that this is really a bug in cache handling.
> Or so I gather; reading on...
> 
> > I don't think this should be call "IOMAP_TRUNC_PAGE", though,
> > because that indicates the caller context, not what we are asking
> > the internal iomap code to do. What we are really asking is for
> > iomap_zero_iter() to do is zero the page cache if it exists in
> > memory, otherwise ignore unwritten/hole pages.  Hence I think a name
> > like IOMAP_ZERO_PAGECACHE is more appropriate,
> 
> I don't even like ZERO_PAGECACHE -- in my mind that implies that it
> unconditionally zeroes any page it finds, whereas we really only want it
> to zero dirty cache contents.  IOMAP_ZERO_DIRTY_CACHE?

Fine by me, the name just needs to describe the action that needs to
be performed....

> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index 07da03976ec1..16d9b838e82d 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -915,6 +915,7 @@ xfs_buffered_write_iomap_begin(
> > >  	int			allocfork = XFS_DATA_FORK;
> > >  	int			error = 0;
> > >  	unsigned int		lockmode = XFS_ILOCK_EXCL;
> > > +	u16			iomap_flags = 0;
> > >  
> > >  	if (xfs_is_shutdown(mp))
> > >  		return -EIO;
> > > @@ -942,6 +943,10 @@ xfs_buffered_write_iomap_begin(
> > >  	if (error)
> > >  		goto out_unlock;
> > >  
> > > +	if ((flags & IOMAP_TRUNC_PAGE) &&
> > > +	    filemap_range_needs_writeback(VFS_I(ip)->i_mapping, offset, offset))
> > > +			iomap_flags |= IOMAP_F_TRUNC_PAGE;
> > 
> > As per above, I don't think we should be putting this check in the
> > filesystem. That simplifies this a lot as filesystems don't need to
> > know anything about how iomap manages the page cache for the
> > filesystem...
> 
> I gather from the bug description that this appears to me to be a
> problem with how we manage the page cache during a truncation when the
> eofpage is backed by unwritten extents.

Right, think of iomap_truncate_page() as having exactly the same
responsibilites as block_truncate_page() has for filesystems using
bufferheads. i.e. both functions need to ensure the disk contents
are correctly zeroed such that the caller can safely call
truncate_setsize() afterwards resulting in both the on-disk state
and in-memory state remaining coherent.

Hence iomap_truncate_page() needs to ensure that we handle dirty
data over unwritten extents correctly. If we look further, it is
obvious that iomap already has this responsibility:
SEEK_HOLE/SEEK_DATA does page cache lookups to find data over
unwritten extents. Hence it makes no sense for one part of iomap to
take responsibility for managing data over unwritten extents, whilst
another part ignores it...

If there was another filesystem using iomap and unwritten extents,
it would have exactly the same issues with iomap_truncate_page().
Hence:

> As such, I think that this
> should be a fix within iomap.

This.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux