On Tue, Feb 02, 2021 at 09:47:26AM -0800, Darrick J. Wong wrote: > On Tue, Feb 02, 2021 at 08:13:15AM -0500, Brian Foster wrote: > > On Mon, Feb 01, 2021 at 06:03:23PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > While refactoring the quota code to create a function to allocate inode > > > change transactions, I noticed that xfs_qm_vop_chown_reserve does more > > > than just make reservations: it also *modifies* the incore counts > > > directly to handle the owner id change for the delalloc blocks. > > > > > > I then observed that the fssetxattr code continues validating input > > > arguments after making the quota reservation but before dirtying the > > > transaction. If the routine decides to error out, it fails to undo the > > > accounting switch! This leads to incorrect quota reservation and > > > failure down the line. > > > > > > We can fix this by making the reservation function do only that -- for > > > the new dquot, it reserves ondisk and delalloc blocks to the > > > transaction, and the old dquot hangs on to its incore reservation for > > > now. Once we actually switch the dquots, we can then update the incore > > > reservations because we've dirtied the transaction and it's too late to > > > turn back now. > > > > > > No fixes tag because this has been broken since the start of git. > > > > > > > Do we have any test coverage for this problem? > > Working on it, yeah. The tricky part is finding a combination of > FSSETXATTR values that will reliably trigger an error return. I think > it can be done by allocating delalloc blocks and then trying to set an > extent size hint and change the project id in a single call. > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > > --- > > > fs/xfs/xfs_qm.c | 91 +++++++++++++++++++++---------------------------------- > > > 1 file changed, 34 insertions(+), 57 deletions(-) > > > > > > > > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c > > > index c134eb4aeaa8..322d337b5dca 100644 > > > --- a/fs/xfs/xfs_qm.c > > > +++ b/fs/xfs/xfs_qm.c > > > @@ -1785,6 +1785,28 @@ xfs_qm_vop_chown( > > > xfs_trans_mod_dquot(tp, newdq, bfield, ip->i_d.di_nblocks); > > > xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_ICOUNT, 1); > > > > > > + /* > > > + * Back when we made quota reservations for the chown, we reserved the > > > + * ondisk blocks + delalloc blocks with the new dquot. Now that we've > > > + * switched the dquots, decrease the new dquot's block reservation > > > + * (having already bumped up the real counter) so that we don't have > > > + * any reservation to give back when we commit. > > > + */ > > > + xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS, > > > + -ip->i_delayed_blks); > > > + > > > > Ok, so the reservation code code below continues to reserve di_blocks > > and delalloc blocks as it did before, but associates it with the > > transaction and no longer reduces delalloc reservation from the old > > dquots. Instead, this function increases the consumption of reserved > > blocks for di_blocks and then removes the additional quota reservation > > from the transaction, but not the new dquots. Thus when the transaction > > commits, this has the side effect of leaving additional reservation on > > the new dquots that correspond to the delalloc blocks on the inode. Am I > > following that correctly? > > Correct. > Thanks. > > > + /* > > > + * Give the incore reservation for delalloc blocks back to the old > > > + * dquot. We don't normally handle delalloc quota reservations > > > + * transactionally, so just lock the dquot and subtract from the > > > + * reservation. We've dirtied the transaction, so it's too late to > > > + * turn back now. > > > + */ > > > + tp->t_flags |= XFS_TRANS_DIRTY; > > > + xfs_dqlock(prevdq); > > > + prevdq->q_blk.reserved -= ip->i_delayed_blks; > > > + xfs_dqunlock(prevdq); > > > + > > > > What's the reason for not using xfs_trans_reserve_quota_bydquots(NULL, > > ...) here like the original code? > > xfs_trans_reserve_quota_bydquots() makes the caller pass in user, group, > and project dquots. It's not difficult to add more code to declare and > route parameters, but that just felt overdone. > Ah, Ok. > Given that this is the only place in the codebase where we want to > change the incore quota reservation on a single dquot, I also didn't > think it was worth making a whole new function. > > FWIW I don't really mind doing it, it just seemed like more work. > Alternately I suppose I could expose xfs_trans_dqresv. > I have no strong preference. It seems fine to me as it is, the reason for the change just wasn't clear to me at first glance: Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > --D > > > Brian > > > > > /* > > > * Take an extra reference, because the inode is going to keep > > > * this dquot pointer even after the trans_commit. > > > @@ -1807,84 +1829,39 @@ xfs_qm_vop_chown_reserve( > > > uint flags) > > > { > > > struct xfs_mount *mp = ip->i_mount; > > > - uint64_t delblks; > > > unsigned int blkflags; > > > - struct xfs_dquot *udq_unres = NULL; > > > - struct xfs_dquot *gdq_unres = NULL; > > > - struct xfs_dquot *pdq_unres = NULL; > > > struct xfs_dquot *udq_delblks = NULL; > > > struct xfs_dquot *gdq_delblks = NULL; > > > struct xfs_dquot *pdq_delblks = NULL; > > > - int error; > > > - > > > > > > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)); > > > ASSERT(XFS_IS_QUOTA_RUNNING(mp)); > > > > > > - delblks = ip->i_delayed_blks; > > > blkflags = XFS_IS_REALTIME_INODE(ip) ? > > > XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS; > > > > > > if (XFS_IS_UQUOTA_ON(mp) && udqp && > > > - i_uid_read(VFS_I(ip)) != udqp->q_id) { > > > + i_uid_read(VFS_I(ip)) != udqp->q_id) > > > udq_delblks = udqp; > > > - /* > > > - * If there are delayed allocation blocks, then we have to > > > - * unreserve those from the old dquot, and add them to the > > > - * new dquot. > > > - */ > > > - if (delblks) { > > > - ASSERT(ip->i_udquot); > > > - udq_unres = ip->i_udquot; > > > - } > > > - } > > > + > > > if (XFS_IS_GQUOTA_ON(ip->i_mount) && gdqp && > > > - i_gid_read(VFS_I(ip)) != gdqp->q_id) { > > > + i_gid_read(VFS_I(ip)) != gdqp->q_id) > > > gdq_delblks = gdqp; > > > - if (delblks) { > > > - ASSERT(ip->i_gdquot); > > > - gdq_unres = ip->i_gdquot; > > > - } > > > - } > > > > > > if (XFS_IS_PQUOTA_ON(ip->i_mount) && pdqp && > > > - ip->i_d.di_projid != pdqp->q_id) { > > > + ip->i_d.di_projid != pdqp->q_id) > > > pdq_delblks = pdqp; > > > - if (delblks) { > > > - ASSERT(ip->i_pdquot); > > > - pdq_unres = ip->i_pdquot; > > > - } > > > - } > > > - > > > - error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, > > > - udq_delblks, gdq_delblks, pdq_delblks, > > > - ip->i_d.di_nblocks, 1, flags | blkflags); > > > - if (error) > > > - return error; > > > > > > /* > > > - * Do the delayed blks reservations/unreservations now. Since, these > > > - * are done without the help of a transaction, if a reservation fails > > > - * its previous reservations won't be automatically undone by trans > > > - * code. So, we have to do it manually here. > > > + * Reserve enough quota to handle blocks on disk and reserved for a > > > + * delayed allocation. We'll actually transfer the delalloc > > > + * reservation between dquots at chown time, even though that part is > > > + * only semi-transactional. > > > */ > > > - if (delblks) { > > > - /* > > > - * Do the reservations first. Unreservation can't fail. > > > - */ > > > - ASSERT(udq_delblks || gdq_delblks || pdq_delblks); > > > - ASSERT(udq_unres || gdq_unres || pdq_unres); > > > - error = xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount, > > > - udq_delblks, gdq_delblks, pdq_delblks, > > > - (xfs_qcnt_t)delblks, 0, flags | blkflags); > > > - if (error) > > > - return error; > > > - xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount, > > > - udq_unres, gdq_unres, pdq_unres, > > > - -((xfs_qcnt_t)delblks), 0, blkflags); > > > - } > > > - > > > - return 0; > > > + return xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, udq_delblks, > > > + gdq_delblks, pdq_delblks, > > > + ip->i_d.di_nblocks + ip->i_delayed_blks, > > > + 1, blkflags | flags); > > > } > > > > > > int > > > > > >