Re: [PATCH 01/16] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 02, 2021 at 09:47:26AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 02, 2021 at 08:13:15AM -0500, Brian Foster wrote:
> > On Mon, Feb 01, 2021 at 06:03:23PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > While refactoring the quota code to create a function to allocate inode
> > > change transactions, I noticed that xfs_qm_vop_chown_reserve does more
> > > than just make reservations: it also *modifies* the incore counts
> > > directly to handle the owner id change for the delalloc blocks.
> > > 
> > > I then observed that the fssetxattr code continues validating input
> > > arguments after making the quota reservation but before dirtying the
> > > transaction.  If the routine decides to error out, it fails to undo the
> > > accounting switch!  This leads to incorrect quota reservation and
> > > failure down the line.
> > > 
> > > We can fix this by making the reservation function do only that -- for
> > > the new dquot, it reserves ondisk and delalloc blocks to the
> > > transaction, and the old dquot hangs on to its incore reservation for
> > > now.  Once we actually switch the dquots, we can then update the incore
> > > reservations because we've dirtied the transaction and it's too late to
> > > turn back now.
> > > 
> > > No fixes tag because this has been broken since the start of git.
> > > 
> > 
> > Do we have any test coverage for this problem?
> 
> Working on it, yeah.  The tricky part is finding a combination of
> FSSETXATTR values that will reliably trigger an error return.  I think
> it can be done by allocating delalloc blocks and then trying to set an
> extent size hint and change the project id in a single call.
> 
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> > > ---
> > >  fs/xfs/xfs_qm.c |   91 +++++++++++++++++++++----------------------------------
> > >  1 file changed, 34 insertions(+), 57 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > > index c134eb4aeaa8..322d337b5dca 100644
> > > --- a/fs/xfs/xfs_qm.c
> > > +++ b/fs/xfs/xfs_qm.c
> > > @@ -1785,6 +1785,28 @@ xfs_qm_vop_chown(
> > >  	xfs_trans_mod_dquot(tp, newdq, bfield, ip->i_d.di_nblocks);
> > >  	xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_ICOUNT, 1);
> > >  
> > > +	/*
> > > +	 * Back when we made quota reservations for the chown, we reserved the
> > > +	 * ondisk blocks + delalloc blocks with the new dquot.  Now that we've
> > > +	 * switched the dquots, decrease the new dquot's block reservation
> > > +	 * (having already bumped up the real counter) so that we don't have
> > > +	 * any reservation to give back when we commit.
> > > +	 */
> > > +	xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS,
> > > +			-ip->i_delayed_blks);
> > > +
> > 
> > Ok, so the reservation code code below continues to reserve di_blocks
> > and delalloc blocks as it did before, but associates it with the
> > transaction and no longer reduces delalloc reservation from the old
> > dquots. Instead, this function increases the consumption of reserved
> > blocks for di_blocks and then removes the additional quota reservation
> > from the transaction, but not the new dquots. Thus when the transaction
> > commits, this has the side effect of leaving additional reservation on
> > the new dquots that correspond to the delalloc blocks on the inode. Am I
> > following that correctly?
> 
> Correct.
> 

Thanks.

> > > +	/*
> > > +	 * Give the incore reservation for delalloc blocks back to the old
> > > +	 * dquot.  We don't normally handle delalloc quota reservations
> > > +	 * transactionally, so just lock the dquot and subtract from the
> > > +	 * reservation.  We've dirtied the transaction, so it's too late to
> > > +	 * turn back now.
> > > +	 */
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	xfs_dqlock(prevdq);
> > > +	prevdq->q_blk.reserved -= ip->i_delayed_blks;
> > > +	xfs_dqunlock(prevdq);
> > > +
> > 
> > What's the reason for not using xfs_trans_reserve_quota_bydquots(NULL,
> > ...) here like the original code?
> 
> xfs_trans_reserve_quota_bydquots() makes the caller pass in user, group,
> and project dquots.  It's not difficult to add more code to declare and
> route parameters, but that just felt overdone.
> 

Ah, Ok.

> Given that this is the only place in the codebase where we want to
> change the incore quota reservation on a single dquot, I also didn't
> think it was worth making a whole new function.
> 
> FWIW I don't really mind doing it, it just seemed like more work.
> Alternately I suppose I could expose xfs_trans_dqresv.
> 

I have no strong preference. It seems fine to me as it is, the reason
for the change just wasn't clear to me at first glance:

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

> --D
> 
> > Brian
> > 
> > >  	/*
> > >  	 * Take an extra reference, because the inode is going to keep
> > >  	 * this dquot pointer even after the trans_commit.
> > > @@ -1807,84 +1829,39 @@ xfs_qm_vop_chown_reserve(
> > >  	uint			flags)
> > >  {
> > >  	struct xfs_mount	*mp = ip->i_mount;
> > > -	uint64_t		delblks;
> > >  	unsigned int		blkflags;
> > > -	struct xfs_dquot	*udq_unres = NULL;
> > > -	struct xfs_dquot	*gdq_unres = NULL;
> > > -	struct xfs_dquot	*pdq_unres = NULL;
> > >  	struct xfs_dquot	*udq_delblks = NULL;
> > >  	struct xfs_dquot	*gdq_delblks = NULL;
> > >  	struct xfs_dquot	*pdq_delblks = NULL;
> > > -	int			error;
> > > -
> > >  
> > >  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> > >  	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
> > >  
> > > -	delblks = ip->i_delayed_blks;
> > >  	blkflags = XFS_IS_REALTIME_INODE(ip) ?
> > >  			XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
> > >  
> > >  	if (XFS_IS_UQUOTA_ON(mp) && udqp &&
> > > -	    i_uid_read(VFS_I(ip)) != udqp->q_id) {
> > > +	    i_uid_read(VFS_I(ip)) != udqp->q_id)
> > >  		udq_delblks = udqp;
> > > -		/*
> > > -		 * If there are delayed allocation blocks, then we have to
> > > -		 * unreserve those from the old dquot, and add them to the
> > > -		 * new dquot.
> > > -		 */
> > > -		if (delblks) {
> > > -			ASSERT(ip->i_udquot);
> > > -			udq_unres = ip->i_udquot;
> > > -		}
> > > -	}
> > > +
> > >  	if (XFS_IS_GQUOTA_ON(ip->i_mount) && gdqp &&
> > > -	    i_gid_read(VFS_I(ip)) != gdqp->q_id) {
> > > +	    i_gid_read(VFS_I(ip)) != gdqp->q_id)
> > >  		gdq_delblks = gdqp;
> > > -		if (delblks) {
> > > -			ASSERT(ip->i_gdquot);
> > > -			gdq_unres = ip->i_gdquot;
> > > -		}
> > > -	}
> > >  
> > >  	if (XFS_IS_PQUOTA_ON(ip->i_mount) && pdqp &&
> > > -	    ip->i_d.di_projid != pdqp->q_id) {
> > > +	    ip->i_d.di_projid != pdqp->q_id)
> > >  		pdq_delblks = pdqp;
> > > -		if (delblks) {
> > > -			ASSERT(ip->i_pdquot);
> > > -			pdq_unres = ip->i_pdquot;
> > > -		}
> > > -	}
> > > -
> > > -	error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount,
> > > -				udq_delblks, gdq_delblks, pdq_delblks,
> > > -				ip->i_d.di_nblocks, 1, flags | blkflags);
> > > -	if (error)
> > > -		return error;
> > >  
> > >  	/*
> > > -	 * Do the delayed blks reservations/unreservations now. Since, these
> > > -	 * are done without the help of a transaction, if a reservation fails
> > > -	 * its previous reservations won't be automatically undone by trans
> > > -	 * code. So, we have to do it manually here.
> > > +	 * Reserve enough quota to handle blocks on disk and reserved for a
> > > +	 * delayed allocation.  We'll actually transfer the delalloc
> > > +	 * reservation between dquots at chown time, even though that part is
> > > +	 * only semi-transactional.
> > >  	 */
> > > -	if (delblks) {
> > > -		/*
> > > -		 * Do the reservations first. Unreservation can't fail.
> > > -		 */
> > > -		ASSERT(udq_delblks || gdq_delblks || pdq_delblks);
> > > -		ASSERT(udq_unres || gdq_unres || pdq_unres);
> > > -		error = xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
> > > -			    udq_delblks, gdq_delblks, pdq_delblks,
> > > -			    (xfs_qcnt_t)delblks, 0, flags | blkflags);
> > > -		if (error)
> > > -			return error;
> > > -		xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
> > > -				udq_unres, gdq_unres, pdq_unres,
> > > -				-((xfs_qcnt_t)delblks), 0, blkflags);
> > > -	}
> > > -
> > > -	return 0;
> > > +	return xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, udq_delblks,
> > > +			gdq_delblks, pdq_delblks,
> > > +			ip->i_d.di_nblocks + ip->i_delayed_blks,
> > > +			1, blkflags | flags);
> > >  }
> > >  
> > >  int
> > > 
> > 
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux