Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 10 Jan 2024 17:42:04 -0800

On Tue, Jan 09, 2024 at 05:38:16PM +1100, Dave Chinner wrote:
> On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > > From: Jian Wen <wenjianhn@xxxxxxxxx>
> > > > 
> > > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > > the whole filesystem instead of the project quota.
> > > > 
> > > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > > for both EDQUOT and ENOSPC consistent.
> > > > 
> > > > Changes since v3:
> > > >   - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > > >   - acquire the dquot lock before checking the free space
> > > > 
> > > > Changes since v2:
> > > >   - completely rewrote based on the suggestions from Dave
> > > > 
> > > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > > > Signed-off-by: Jian Wen <wenjian1@xxxxxxxxxx>
> > > 
> > > Please send new patch versions as a new thread, not as a reply to
> > > a random email in the middle of the review thread for a previous
> > > version.
> > > 
> > > > ---
> > > >  fs/xfs/xfs_dquot.h       | 22 +++++++++++++++---
> > > >  fs/xfs/xfs_file.c        | 41 ++++++++++++--------------------
> > > >  fs/xfs/xfs_icache.c      | 50 +++++++++++++++++++++++++++++-----------
> > > >  fs/xfs/xfs_icache.h      |  7 +++---
> > > >  fs/xfs/xfs_inode.c       | 19 ++++++++-------
> > > >  fs/xfs/xfs_reflink.c     |  5 ++++
> > > >  fs/xfs/xfs_trans.c       | 41 ++++++++++++++++++++++++--------
> > > >  fs/xfs/xfs_trans_dquot.c |  3 ---
> > > >  8 files changed, 121 insertions(+), 67 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > > --- a/fs/xfs/xfs_dquot.h
> > > > +++ b/fs/xfs/xfs_dquot.h
> > > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > > >  	return false;
> > > >  }
> > > >  
> > > > +static inline bool
> > > > +xfs_dquot_hardlimit_exceeded(
> > > > +	struct xfs_dquot	*dqp)
> > > > +{
> > > > +	int64_t freesp;
> > > > +
> > > > +	if (!dqp)
> > > > +		return false;
> > > > +	if (!xfs_dquot_is_enforced(dqp))
> > > > +		return false;
> > > > +	xfs_dqlock(dqp);
> > > > +	freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > > +	xfs_dqunlock(dqp);
> > > > +	return freesp < 0;
> > > > +}
> > > 
> > > Ok, what about if the project quota EDQUOT has come about because we
> > > are over the inode count limit or the realtime block limit? Both of
> > > those need to be converted to ENOSPC, too.
> > > 
> > > i.e. all the inode creation operation need to be checked against
> > > both the data device block space and the inode count space, whilst
> > > data writes need to be checked against data space for normal IO
> > > and both data space and real time space for inodes that are writing
> > > to real time devices.
> > 
> > (Yeah.)
> > 
> > > Also, why do we care about locking here? If something is modifying
> > > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > > to protect this code from races. All it means is that we we'll block
> > > waiting for the transaction that holds the dquot locked to complete
> > > and we'll either get the same random failure or success as if we
> > > didn't hold the lock during this calculation...
> > 
> > I thought we had to hold the dquot lock before accessing its fields.
> 
> Only if we care about avoiding races with ongoing modifications or
> we want to serialise against new references (e.g. because we are
> about to reclaim the dquot).
> 
> The inode holds a reference to the dquot at this point (because of
> xfs_qm_dqattach()), so we really don't need to hold a lock just
> to sample the contents of the attached dquot.
> 
> > Or are you really saying that it's silly to take the dquot lock *again*
> > having already decided (under dqlock elsewhere) that we were over a
> > quota?
> 
> No, I'm saying that we really don't have to hold the dqlock to
> determine if the dquot is over quota limits. It's either going to
> over or under, and holding the dqlock while sampling it really
> doesn't change the fact that it the dquot accounting can change
> between the initial check under the dqlock and a subsequent check
> on the second failure under a different hold of the dqlock.
> 
> It's an inherently racy check, and holding the dqlock does nothing
> to make it less racy or more accurate.
> 
> > In that case, perhaps it makes more sense to have
> > xfs_trans_dqresv return an unusual errno for "project quota over limits"
> > so that callers can trap that magic value and translate it into ENOSPC?
> 
> Sure, that's another option, but it means we have to trap EDQUOT,
> ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
> errors. I'm not sure it will improve the code a great deal, but if
> there's a clean way to implement such error handling it may make
> more sense. Have you prototyped how such error handling would look
> in these cases?
> 
> Which also makes me wonder if we should actually be returning what
> quota limit failed, not EDQUOT. To take the correct flush action, we
> really need to know if we failed on data blocks, inode count or rt
> extents. e.g. flushing data won't help alleviate an inode count
> overrun...

Yeah.  If it's an rtbcount, then it only makes sense to flush realtime
files; if it's a bcount, we flush nonrt files, and if it's an inode
count then I guess we push inodegc?

--D

> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>