Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 9 Jan 2024 17:38:16 +1100

On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > From: Jian Wen <wenjianhn@xxxxxxxxx>
> > > 
> > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > the whole filesystem instead of the project quota.
> > > 
> > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > for both EDQUOT and ENOSPC consistent.
> > > 
> > > Changes since v3:
> > >   - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > >   - acquire the dquot lock before checking the free space
> > > 
> > > Changes since v2:
> > >   - completely rewrote based on the suggestions from Dave
> > > 
> > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > > Signed-off-by: Jian Wen <wenjian1@xxxxxxxxxx>
> > 
> > Please send new patch versions as a new thread, not as a reply to
> > a random email in the middle of the review thread for a previous
> > version.
> > 
> > > ---
> > >  fs/xfs/xfs_dquot.h       | 22 +++++++++++++++---
> > >  fs/xfs/xfs_file.c        | 41 ++++++++++++--------------------
> > >  fs/xfs/xfs_icache.c      | 50 +++++++++++++++++++++++++++++-----------
> > >  fs/xfs/xfs_icache.h      |  7 +++---
> > >  fs/xfs/xfs_inode.c       | 19 ++++++++-------
> > >  fs/xfs/xfs_reflink.c     |  5 ++++
> > >  fs/xfs/xfs_trans.c       | 41 ++++++++++++++++++++++++--------
> > >  fs/xfs/xfs_trans_dquot.c |  3 ---
> > >  8 files changed, 121 insertions(+), 67 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > --- a/fs/xfs/xfs_dquot.h
> > > +++ b/fs/xfs/xfs_dquot.h
> > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > >  	return false;
> > >  }
> > >  
> > > +static inline bool
> > > +xfs_dquot_hardlimit_exceeded(
> > > +	struct xfs_dquot	*dqp)
> > > +{
> > > +	int64_t freesp;
> > > +
> > > +	if (!dqp)
> > > +		return false;
> > > +	if (!xfs_dquot_is_enforced(dqp))
> > > +		return false;
> > > +	xfs_dqlock(dqp);
> > > +	freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > +	xfs_dqunlock(dqp);
> > > +	return freesp < 0;
> > > +}
> > 
> > Ok, what about if the project quota EDQUOT has come about because we
> > are over the inode count limit or the realtime block limit? Both of
> > those need to be converted to ENOSPC, too.
> > 
> > i.e. all the inode creation operation need to be checked against
> > both the data device block space and the inode count space, whilst
> > data writes need to be checked against data space for normal IO
> > and both data space and real time space for inodes that are writing
> > to real time devices.
> 
> (Yeah.)
> 
> > Also, why do we care about locking here? If something is modifying
> > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > to protect this code from races. All it means is that we we'll block
> > waiting for the transaction that holds the dquot locked to complete
> > and we'll either get the same random failure or success as if we
> > didn't hold the lock during this calculation...
> 
> I thought we had to hold the dquot lock before accessing its fields.

Only if we care about avoiding races with ongoing modifications or
we want to serialise against new references (e.g. because we are
about to reclaim the dquot).

The inode holds a reference to the dquot at this point (because of
xfs_qm_dqattach()), so we really don't need to hold a lock just
to sample the contents of the attached dquot.

> Or are you really saying that it's silly to take the dquot lock *again*
> having already decided (under dqlock elsewhere) that we were over a
> quota?

No, I'm saying that we really don't have to hold the dqlock to
determine if the dquot is over quota limits. It's either going to
over or under, and holding the dqlock while sampling it really
doesn't change the fact that it the dquot accounting can change
between the initial check under the dqlock and a subsequent check
on the second failure under a different hold of the dqlock.

It's an inherently racy check, and holding the dqlock does nothing
to make it less racy or more accurate.

> In that case, perhaps it makes more sense to have
> xfs_trans_dqresv return an unusual errno for "project quota over limits"
> so that callers can trap that magic value and translate it into ENOSPC?

Sure, that's another option, but it means we have to trap EDQUOT,
ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
errors. I'm not sure it will improve the code a great deal, but if
there's a clean way to implement such error handling it may make
more sense. Have you prototyped how such error handling would look
in these cases?

Which also makes me wonder if we should actually be returning what
quota limit failed, not EDQUOT. To take the correct flush action, we
really need to know if we failed on data blocks, inode count or rt
extents. e.g. flushing data won't help alleviate an inode count
overrun...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx