On Tue, Jan 12, 2021 at 12:40:45PM +1100, Dave Chinner wrote: > On Mon, Jan 11, 2021 at 05:31:26PM -0800, Darrick J. Wong wrote: > > On Tue, Jan 12, 2021 at 12:22:49PM +1100, Dave Chinner wrote: > > > On Mon, Jan 11, 2021 at 03:23:05PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > If a fs modification (creation, file write, reflink, etc.) is unable to > > > > reserve enough quota to handle the modification, try clearing whatever > > > > space the filesystem might have been hanging onto in the hopes of > > > > speeding up the filesystem. The flushing behavior will become > > > > particularly important when we add deferred inode inactivation because > > > > that will increase the amount of space that isn't actively tied to user > > > > data. > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > --- > > > > fs/xfs/xfs_bmap_util.c | 16 ++++++++++++++++ > > > > fs/xfs/xfs_file.c | 2 +- > > > > fs/xfs/xfs_icache.c | 9 +++++++-- > > > > fs/xfs/xfs_icache.h | 2 +- > > > > fs/xfs/xfs_inode.c | 17 +++++++++++++++++ > > > > fs/xfs/xfs_ioctl.c | 2 ++ > > > > fs/xfs/xfs_iomap.c | 20 +++++++++++++++++++- > > > > fs/xfs/xfs_reflink.c | 40 +++++++++++++++++++++++++++++++++++++--- > > > > fs/xfs/xfs_trace.c | 1 + > > > > fs/xfs/xfs_trace.h | 40 ++++++++++++++++++++++++++++++++++++++++ > > > > 10 files changed, 141 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > > > index 7371a7f7c652..437fdc8a8fbd 100644 > > > > --- a/fs/xfs/xfs_bmap_util.c > > > > +++ b/fs/xfs/xfs_bmap_util.c > > > > @@ -761,6 +761,7 @@ xfs_alloc_file_space( > > > > */ > > > > while (allocatesize_fsb && !error) { > > > > xfs_fileoff_t s, e; > > > > + bool cleared_space = false; > > > > > > > > /* > > > > * Determine space reservations for data/realtime. > > > > @@ -803,6 +804,7 @@ xfs_alloc_file_space( > > > > /* > > > > * Allocate and setup the transaction. > > > > */ > > > > +retry: > > > > error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, > > > > resrtextents, 0, &tp); > > > > > > > > @@ -819,6 +821,20 @@ xfs_alloc_file_space( > > > > xfs_ilock(ip, XFS_ILOCK_EXCL); > > > > error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, > > > > 0, quota_flag); > > > > + /* > > > > + * We weren't able to reserve enough quota to handle fallocate. > > > > + * Flush any disk space that was being held in the hopes of > > > > + * speeding up the filesystem. We hold the IOLOCK so we cannot > > > > + * do a synchronous scan. > > > > + */ > > > > + if ((error == -ENOSPC || error == -EDQUOT) && !cleared_space) { > > > > + xfs_trans_cancel(tp); > > > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > > > + cleared_space = xfs_inode_free_quota_blocks(ip, false); > > > > + if (cleared_space) > > > > + goto retry; > > > > + return error; > > > > > > Can't say I'm a fan of repeating this everywhere. Can we move this > > > into xfs_trans_reserve_quota_nblks() with a "retry" flag such that > > > we do: > > > > > > error = xfs_trans_reserve_quota_nblks(tp, ip, qblocks, 0, > > > quota_flag, &retry); > > > if (error) { > > > /* tp already cancelled, inode unlocked */ > > > return error; > > > } > > > if (retry) { > > > /* tp already cancelled, inode unlocked */ > > > goto retry; > > > } > > > > Assuming you'd be interested in the same kind of change being applied to > > the next patch (kick the inode reclaim from xfs_trans_reserve on ENOSPC) > > then yes, that seems like a nice cleanup. > > *nod* > > It definitely seems to me to be cleaner to put all this GC stuff > into the transaction setup code that does the actual space > reservation, and then simply have the code that is setting up the > transactions handle failures appropriately. I tried converting this and it wasn't as easy as I thought. The downside is that now we have a function that sometimes consumes the transaction and the ILOCK, and there's still the question of what to do if xfs_inode_free_quota_blocks doesn't find any space to free. By leaving all the ugliness in the call sites, we maintain the property that the function that allocates the transaction and ilocks the inode also gets to iunlock and free the tp, and we can also skip the retry if the eofblocks flush doesn't clear anything. It might be easier to rework this as a xfs_trans_alloc_quota function wherein you feed it a tres, the inode, the number of blocks you want, and whether or not this is an rt extent; and either it reserves and locks everything for you, or returns failure. The downside is that doesn't work for reflink since it doesn't always require quota reservation to remap an extent. But it's pretty late in the day and my brain is scrambled eggs so I'll defer until tomorrow. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx