Re: [PATCH 2/2] xfs: reserve quota for target dir expansion when renaming files

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 9 Mar 2022 15:36:44 -0800

On Thu, Mar 10, 2022 at 09:05:53AM +1100, Dave Chinner wrote:
> On Wed, Mar 09, 2022 at 11:22:32AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > XFS does not reserve quota for directory expansion when renaming
> > children into a directory.  This means that we don't reject the
> > expansion with EDQUOT when we're at or near a hard limit, which means
> > that unprivileged userspace can use rename() to exceed quota.
> > 
> > Rename operations don't always expand the target directory, and we allow
> > a rename to proceed with no space reservation if we don't need to add a
> > block to the target directory to handle the addition.  Moreover, the
> > unlink operation on the source directory generally does not expand the
> > directory (you'd have to free a block and then cause a btree split) and
> > it's probably of little consequence to leave the corner case that
> > renaming a file out of a directory can increase its size.
> > 
> > As with link and unlink, there is a further bug in that we do not
> > trigger the blockgc workers to try to clear space when we're out of
> > quota.
> > 
> > Because rename is its own special tricky animal, we'll patch xfs_rename
> > directly to reserve quota to the rename transaction.
> 
> Yeah, and this makes it even more tricky - the retry jumps back
> across the RENAME_EXCHANGE callout/exit from xfs_rename. At some
> point we need to clean up the spaghetti that rename has become.

Heh.  I did that as a cleanup to the directory code ahead of the
metadata directory tree patchset.

> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_inode.c |   37 ++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 36 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index a131bbfe74e4..8ff67b7aad53 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3095,7 +3095,8 @@ xfs_rename(
> >  	bool			new_parent = (src_dp != target_dp);
> >  	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> >  	int			spaceres;
> > -	int			error;
> > +	bool			retried = false;
> > +	int			error, space_error;
> >  
> >  	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
> >  
> > @@ -3119,9 +3120,12 @@ xfs_rename(
> >  	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
> >  				inodes, &num_inodes);
> >  
> > +retry:
> > +	space_error = 0;
> >  	spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
> >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_rename, spaceres, 0, 0, &tp);
> >  	if (error == -ENOSPC) {
> > +		space_error = error;
> 
> nospace_error.

Fixed.

> >  		spaceres = 0;
> >  		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_rename, 0, 0, 0,
> >  				&tp);
> > @@ -3175,6 +3179,31 @@ xfs_rename(
> >  					target_dp, target_name, target_ip,
> >  					spaceres);
> >  
> > +	/*
> > +	 * Try to reserve quota to handle an expansion of the target directory.
> > +	 * We'll allow the rename to continue in reservationless mode if we hit
> > +	 * a space usage constraint.  If we trigger reservationless mode, save
> > +	 * the errno if there isn't any free space in the target directory.
> > +	 */
> > +	if (spaceres != 0) {
> > +		error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
> > +				0, false);
> > +		if (error == -EDQUOT || error == -ENOSPC) {
> > +			if (!retried) {
> > +				xfs_trans_cancel(tp);
> > +				xfs_blockgc_free_quota(target_dp, 0);
> > +				retried = true;
> > +				goto retry;
> > +			}
> > +
> > +			space_error = error;
> > +			spaceres = 0;
> > +			error = 0;
> > +		}
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	/*
> >  	 * Check for expected errors before we dirty the transaction
> >  	 * so we can return an error without a transaction abort.
> > @@ -3215,6 +3244,8 @@ xfs_rename(
> >  		 */
> >  		if (!spaceres) {
> >  			error = xfs_dir_canenter(tp, target_dp, target_name);
> > +			if (error == -ENOSPC && space_error)
> > +				error = space_error;
> 
> And move this error transformation to out_trans_cancel: so it only
> has to be coded once.
> 
> Other than that, it's about as clean as rename allows it to be right
> now.

Fixed.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx