On Wed, Mar 09, 2022 at 11:22:26AM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > XFS does not reserve quota for directory expansion when linking or > unlinking children from a directory. This means that we don't reject > the expansion with EDQUOT when we're at or near a hard limit, which > means that unprivileged userspace can use link()/unlink() to exceed > quota. > > The fix for this is nuanced -- link operations don't always expand the > directory, and we allow a link to proceed with no space reservation if > we don't need to add a block to the directory to handle the addition. > Unlink operations generally do not expand the directory (you'd have to > free a block and then cause a btree split) and we can defer the > directory block freeing if there is no space reservation. > > Moreover, there is a further bug in that we do not trigger the blockgc > workers to try to clear space when we're out of quota. > > To fix both cases, create a new xfs_trans_alloc_dir function that > allocates the transaction, locks and joins the inodes, and reserves > quota for the directory. If there isn't sufficient space or quota, > we'll switch the caller to reservationless mode. This should prevent > quota usage overruns with the least restriction in functionality. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/xfs_inode.c | 30 +++++------------- > fs/xfs/xfs_trans.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_trans.h | 3 ++ > 3 files changed, 97 insertions(+), 22 deletions(-) Overall looks good, minor nits below: > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index 04bf467b1090..a131bbfe74e4 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -1217,7 +1217,7 @@ xfs_link( > { > xfs_mount_t *mp = tdp->i_mount; > xfs_trans_t *tp; > - int error; > + int error, space_error; > int resblks; > > trace_xfs_link(tdp, target_name); > @@ -1236,19 +1236,11 @@ xfs_link( > goto std_return; > > resblks = XFS_LINK_SPACE_RES(mp, target_name->len); > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_link, resblks, 0, 0, &tp); > - if (error == -ENOSPC) { > - resblks = 0; > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_link, 0, 0, 0, &tp); > - } > + error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip, &resblks, > + &tp, &space_error); It's the nospace_error, isn't it? The code reads a lot better when it's called that, too. > if (error) > goto std_return; > > - xfs_lock_two_inodes(sip, XFS_ILOCK_EXCL, tdp, XFS_ILOCK_EXCL); > - > - xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL); > - xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL); > - > error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK, > XFS_IEXT_DIR_MANIP_CNT(mp)); > if (error) > @@ -1267,6 +1259,8 @@ xfs_link( > > if (!resblks) { > error = xfs_dir_canenter(tp, tdp, target_name); > + if (error == -ENOSPC && space_error) > + error = space_error; This would be better in the error_return stack, I think. That way the transformation only has to be done once, and it will be done for all functions that can potentially return ENOSPC. > if (error) > goto error_return; > } > @@ -2755,6 +2749,7 @@ xfs_remove( > xfs_mount_t *mp = dp->i_mount; > xfs_trans_t *tp = NULL; > int is_dir = S_ISDIR(VFS_I(ip)->i_mode); > + int dontcare; > int error = 0; > uint resblks; > > @@ -2781,22 +2776,13 @@ xfs_remove( > * block from the directory. > */ > resblks = XFS_REMOVE_SPACE_RES(mp); > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 0, &tp); > - if (error == -ENOSPC) { > - resblks = 0; > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, > - &tp); > - } > + error = xfs_trans_alloc_dir(dp, &M_RES(mp)->tr_remove, ip, &resblks, > + &tp, &dontcare); > if (error) { > ASSERT(error != -ENOSPC); > goto std_return; > } So we just ignore -EDQUOT when it is returned in @dontcare? I'd like a comment to explain why we don't care about EDQUOT here, because the next time I look at this I will have forgotten all about this... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx