Re: [PATCH 3/7] xfs: use byte ranges for write cleanup ranges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 02, 2022 at 09:32:53AM -0700, Darrick J. Wong wrote:
> On Tue, Nov 01, 2022 at 11:34:08AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > xfs_buffered_write_iomap_end() currently converts the byte ranges
> > passed to it to filesystem blocks to pass them to the bmap code to
> > punch out delalloc blocks, but then has to convert filesytem
> > blocks back to byte ranges for page cache truncate.
> > 
> > We're about to make the page cache truncate go away and replace it
> > with a page cache walk, so having to convert everything to/from/to
> > filesystem blocks is messy and error-prone. It is much easier to
> > pass around byte ranges and convert to page indexes and/or
> > filesystem blocks only where those units are needed.
> > 
> > In preparation for the page cache walk being added, add a helper
> > that converts byte ranges to filesystem blocks and calls
> > xfs_bmap_punch_delalloc_range() and convert
> > xfs_buffered_write_iomap_end() to calculate limits in byte ranges.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_iomap.c | 40 +++++++++++++++++++++++++---------------
> >  1 file changed, 25 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index a2e45ea1b0cb..7bb55dbc19d3 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1120,6 +1120,20 @@ xfs_buffered_write_iomap_begin(
> >  	return error;
> >  }
> >  
> > +static int
> > +xfs_buffered_write_delalloc_punch(
> > +	struct inode		*inode,
> > +	loff_t			start_byte,
> > +	loff_t			end_byte)
> > +{
> > +	struct xfs_mount	*mp = XFS_M(inode->i_sb);
> > +	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, start_byte);
> > +	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, end_byte);
> > +
> > +	return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb,
> > +				end_fsb - start_fsb);
> > +}
> 
> /me echoes hch's comment that the other callers of
> xfs_bmap_punch_delalloc_range do this byte->block conversion too.
> 
> > +
> >  static int
> >  xfs_buffered_write_iomap_end(
> >  	struct inode		*inode,
> > @@ -1129,10 +1143,9 @@ xfs_buffered_write_iomap_end(
> >  	unsigned		flags,
> >  	struct iomap		*iomap)
> >  {
> > -	struct xfs_inode	*ip = XFS_I(inode);
> > -	struct xfs_mount	*mp = ip->i_mount;
> > -	xfs_fileoff_t		start_fsb;
> > -	xfs_fileoff_t		end_fsb;
> > +	struct xfs_mount	*mp = XFS_M(inode->i_sb);
> > +	loff_t			start_byte;
> > +	loff_t			end_byte;
> >  	int			error = 0;
> >  
> >  	if (iomap->type != IOMAP_DELALLOC)
> > @@ -1157,13 +1170,13 @@ xfs_buffered_write_iomap_end(
> >  	 * the range.
> >  	 */
> >  	if (unlikely(!written))
> > -		start_fsb = XFS_B_TO_FSBT(mp, offset);
> > +		start_byte = round_down(offset, mp->m_sb.sb_blocksize);
> >  	else
>  -		start_fsb = XFS_B_TO_FSB(mp, offset + written);
> > -	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> > +		start_byte = round_up(offset + written, mp->m_sb.sb_blocksize);
> > +	end_byte = round_up(offset + length, mp->m_sb.sb_blocksize);
> 
> Technically this is the byte where we should *stop* processing, right?
> 
> If we are told to write 1000 bytes at pos 0 and the whole thing fails,
> the end pos of the range is 1023 and we must stop at pos 1024.  Right?

Yes, the interval definition being used here is open-ended i.e.
[start_byte, end_byte) because it makes iterative interval
operations really easy as the value for the start of the next
interval is the same as the value for the end of the current
interval.

That's the way we've traditionally encoded ranges in XFS
because there's a much lower risk of off-by-one errors in
calculations as we iterate through extents. i.e. finding the
start and end of ranges is as simple as round_down/round_up, there's
no magic "+ 1" or "- 1" arithmetic needed anywhere to move from one
interval to the next, etc.

> (The only reason I ask is that Linus ranted about XFS naming these
> variables incorrectly in the iomap code and the (at the time only) user
> of it.)

I don't find that a convincing argument.  What some random dude that
has never touched the XFS or iomap code thinks about how we define
intervals or the notations we use that makes the code _easier for
us to understand_ is just not relevant.

> >  	/* Nothing to do if we've written the entire delalloc extent */
> > -	if (start_fsb >= end_fsb)
> > +	if (start_byte >= end_byte)
> >  		return 0;
> >  
> >  	/*
> > @@ -1173,15 +1186,12 @@ xfs_buffered_write_iomap_end(
> >  	 * leave dirty pages with no space reservation in the cache.
> >  	 */
> >  	filemap_invalidate_lock(inode->i_mapping);
> > -	truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> > -				 XFS_FSB_TO_B(mp, end_fsb) - 1);
> > -
> > -	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> > -				       end_fsb - start_fsb);
> > +	truncate_pagecache_range(inode, start_byte, end_byte - 1);
> 
> ...because the expression "end_byte - 1" looks a little funny when it's
> used to compute the "lend" argument to truncate_pagecache_range.

Yup, truncate_pagecache_range() uses a [] (closed) interval to
define the range, so we need a "- 1" when passing that open-ended
interval into a closed interval API.

But that truncate_pagecache_range() call is going away in the next
patch, so this whole issue is moot, yes?

> > +	error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte);
> >  	filemap_invalidate_unlock(inode->i_mapping);
> >  	if (error && !xfs_is_shutdown(mp)) {
> > -		xfs_alert(mp, "%s: unable to clean up ino %lld",
> > -			__func__, ip->i_ino);
> > +		xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
> > +			__func__, XFS_I(inode)->i_ino);
> 
> Oh, you did fix the ino 0x%llx format thing.  Previous comment
> withdrawn.
> 
> With s/end_byte/next_byte/ and the delalloc punch thing sorted out,

I don't know what you want me to do here, because I don't think this
code is wrong and changing it to closed intervals and next/stop as
variable names makes little sense in the context of the code....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux