Re: [PATCH 05/11] xfs: track CoW blocks separately in the inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 26, 2018 at 08:04:29AM -0500, Brian Foster wrote:
> On Thu, Jan 25, 2018 at 11:21:42AM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 25, 2018 at 08:06:45AM -0500, Brian Foster wrote:
> > > On Tue, Jan 23, 2018 at 06:18:29PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > 
> > > > Track the number of blocks reserved in the CoW fork so that we can
> > > > move the quota reservations whenever we chown, and don't account for
> > > > CoW fork delalloc reservations in i_delayed_blks.  This should make
> > > > chown work properly for quota reservations, enables us to fully
> > > > account for real extents in the cow fork in the file stat info, and
> > > > improves the post-eof scanning decisions because we're no longer
> > > > confusing data fork delalloc extents with cow fork delalloc extents.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_bmap.c      |   16 ++++++++++++----
> > > >  fs/xfs/libxfs/xfs_inode_buf.c |    1 +
> > > >  fs/xfs/xfs_bmap_util.c        |    5 +++++
> > > >  fs/xfs/xfs_icache.c           |    3 ++-
> > > >  fs/xfs/xfs_inode.c            |   11 +++++------
> > > >  fs/xfs/xfs_inode.h            |    1 +
> > > >  fs/xfs/xfs_iops.c             |    3 ++-
> > > >  fs/xfs/xfs_itable.c           |    3 ++-
> > > >  fs/xfs/xfs_qm.c               |    2 +-
> > > >  fs/xfs/xfs_reflink.c          |    4 ++--
> > > >  fs/xfs/xfs_super.c            |    1 +
> > > >  11 files changed, 34 insertions(+), 16 deletions(-)
> > > > 
> > > > 
> > > ...
> > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > index 4a38cfc..a208825 100644
> > > > --- a/fs/xfs/xfs_inode.c
> > > > +++ b/fs/xfs/xfs_inode.c
> > > ...
> > > > @@ -1669,7 +1667,7 @@ xfs_release(
> > > >  		truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
> > > >  		if (truncated) {
> > > >  			xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
> > > > -			if (ip->i_delayed_blks > 0) {
> > > > +			if (ip->i_delayed_blks > 0 || ip->i_cow_blocks > 0) {
> > > >  				error = filemap_flush(VFS_I(ip)->i_mapping);
> > > >  				if (error)
> > > >  					return error;
> > > 
> > > Is having cowblocks really relevant to this hunk? I thought this was
> > > purely a delalloc vs. file size thing, but I could be wrong. 
> > 
> > AFAICT, if we (1) use truncate to reduce a file's size, (2) write
> > somewhere past eof, (3) make some delalloc reservations for the post-eof
> > write, and (4) close the file, then this chunk flushes the dirty data to
> > disk so that if we crash after the close() call returns, the file will
> > still have all the data that was written out.  IOWs, this provides for
> > flush-on-close after a file size reduction.
> > 
> 
> I think it goes back to problems where those subsequent buffered writes
> increase the file size again and the fs crashes before all data is
> written out. E.g., the problem described by commit ba87ea699e ("[XFS]
> Fix to prevent the notorious 'NULL files' problem after a crash."). It's
> not totally clear to me whether that fixed the problem and this
> particular hack is still needed.

Me neither.  It looks like deferring the size update until the write
end_io would have closed this bug... but on the other hand maybe its
function is more to avoid disappointing the people who expect flush on
close behavior...

> FWIW, the flush code looks like it goes back to commit 7d4fb40ad7
> ("[XFS] Start writeout earlier (on last close) ...").
> 
> > So I was thinking that if a write to a lower offset causes the creation
> > of a speculative cow extent of some kind that extends past eof, we'd
> > still want to flush the dirty data to disk on close even if there are no
> > delalloc reservations in the data fork.
> > 
> 
> This whole stanza still depends on a truncate in the first place
> though..?
> 
> I guess I'm not necessarily against doing this, I just think we should
> verify whether it's actually useful to prevent some kind of similar
> crash-recovery problem it was intended to help mitigate. If not, then
> we're subjecting ourselves to the tradeoff, which appears to be that
> we'll initiate writeback of any file with cowblocks on close that has
> been truncated.
> 
> Granted the truncate operation is probably infrequent with respect to
> close() so it's probably not that big of a deal, but in the delalloc

It's probably infrequent wrt cow-and-close, but "echo foo > existingfile"
would trigger this for the regular da case.  I don't really mind
dropping it either, aside from my sense of paranoia. :P

> case a flush is at least generally expected to clear the file of delayed
> allocation. It's my understanding that the same is not necessarily true
> for cowblocks.. cow prealloc means blocks can sit around in the cow fork
> for a while in anticipation of future copy-on-writes, right?

Yes.

--D

> 
> Brian
> 
> > Ofc now I see that xfs_file_iomap_begin_delay will create the data fork
> > da reservation for a non-shared block even if a cow fork extent already
> > exists (the write is promoted to cow), so perhaps this isn't strictly
> > necessary... but adding a data fork da extent when there's already a cow
> > fork extent seems like a (mostly harmless) bug to me.
> > 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > @@ -1909,7 +1907,8 @@ xfs_inactive(
> > > >  
> > > >  	if (S_ISREG(VFS_I(ip)->i_mode) &&
> > > >  	    (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
> > > > -	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
> > > > +	     ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0 ||
> > > > +	     ip->i_cow_blocks > 0))
> > > >  		truncate = 1;
> > > >  
> > > >  	error = xfs_qm_dqattach(ip, 0);
> > > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > > index ff56486..6feee8a 100644
> > > > --- a/fs/xfs/xfs_inode.h
> > > > +++ b/fs/xfs/xfs_inode.h
> > > > @@ -62,6 +62,7 @@ typedef struct xfs_inode {
> > > >  	/* Miscellaneous state. */
> > > >  	unsigned long		i_flags;	/* see defined flags below */
> > > >  	unsigned int		i_delayed_blks;	/* count of delay alloc blks */
> > > > +	unsigned int		i_cow_blocks;	/* count of cow fork blocks */
> > > >  
> > > >  	struct xfs_icdinode	i_d;		/* most of ondisk inode */
> > > >  
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index 56475fc..6c3381c 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -513,7 +513,8 @@ xfs_vn_getattr(
> > > >  	stat->mtime = inode->i_mtime;
> > > >  	stat->ctime = inode->i_ctime;
> > > >  	stat->blocks =
> > > > -		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
> > > > +		XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks +
> > > > +				  ip->i_cow_blocks);
> > > >  
> > > >  	if (ip->i_d.di_version == 3) {
> > > >  		if (request_mask & STATX_BTIME) {
> > > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > > index d583105..412d7eb 100644
> > > > --- a/fs/xfs/xfs_itable.c
> > > > +++ b/fs/xfs/xfs_itable.c
> > > > @@ -122,7 +122,8 @@ xfs_bulkstat_one_int(
> > > >  	case XFS_DINODE_FMT_BTREE:
> > > >  		buf->bs_rdev = 0;
> > > >  		buf->bs_blksize = mp->m_sb.sb_blocksize;
> > > > -		buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks;
> > > > +		buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks +
> > > > +				 ip->i_cow_blocks;
> > > >  		break;
> > > >  	}
> > > >  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > > > index 5b848f4..28f12f8 100644
> > > > --- a/fs/xfs/xfs_qm.c
> > > > +++ b/fs/xfs/xfs_qm.c
> > > > @@ -1847,7 +1847,7 @@ xfs_qm_vop_chown_reserve(
> > > >  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> > > >  	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
> > > >  
> > > > -	delblks = ip->i_delayed_blks;
> > > > +	delblks = ip->i_delayed_blks + ip->i_cow_blocks;
> > > >  	blkflags = XFS_IS_REALTIME_INODE(ip) ?
> > > >  			XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
> > > >  
> > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > > index e367351..f875ea7 100644
> > > > --- a/fs/xfs/xfs_reflink.c
> > > > +++ b/fs/xfs/xfs_reflink.c
> > > > @@ -619,7 +619,7 @@ xfs_reflink_cancel_cow_blocks(
> > > >  	}
> > > >  
> > > >  	/* clear tag if cow fork is emptied */
> > > > -	if (!ifp->if_bytes)
> > > > +	if (ip->i_cow_blocks == 0)
> > > >  		xfs_inode_clear_cowblocks_tag(ip);
> > > >  
> > > >  	return error;
> > > > @@ -704,7 +704,7 @@ xfs_reflink_end_cow(
> > > >  	trace_xfs_reflink_end_cow(ip, offset, count);
> > > >  
> > > >  	/* No COW extents?  That's easy! */
> > > > -	if (ifp->if_bytes == 0)
> > > > +	if (ip->i_cow_blocks == 0)
> > > >  		return 0;
> > > >  
> > > >  	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index f3e0001..9d04cfb 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -989,6 +989,7 @@ xfs_fs_destroy_inode(
> > > >  	xfs_inactive(ip);
> > > >  
> > > >  	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
> > > > +	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_cow_blocks == 0);
> > > >  	XFS_STATS_INC(ip->i_mount, vn_reclaim);
> > > >  
> > > >  	/*
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux