On Wed, Mar 23, 2022 at 10:24:29PM -0700, Darrick J. Wong wrote: > On Thu, Mar 24, 2022 at 08:11:47AM +1100, Dave Chinner wrote: > > On Sun, Mar 20, 2022 at 09:43:49AM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > Don't spin in an infinite loop trying to reserve blocks -- if we can't > > > do it after 30 tries, we're racing with a nearly full filesystem, so > > > just give up. > > > > > > Cc: Brian Foster <bfoster@xxxxxxxxxx> > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_fsops.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > > index 710e857bb825..615334e4f689 100644 > > > --- a/fs/xfs/xfs_fsops.c > > > +++ b/fs/xfs/xfs_fsops.c > > > @@ -379,6 +379,7 @@ xfs_reserve_blocks( > > > int64_t fdblks_delta = 0; > > > uint64_t request; > > > int64_t free; > > > + unsigned int tries; > > > int error = 0; > > > > > > /* If inval is null, report current values and return */ > > > @@ -430,9 +431,16 @@ xfs_reserve_blocks( > > > * If the request is larger than the current reservation, reserve the > > > * blocks before we update the reserve counters. Sample m_fdblocks and > > > * perform a partial reservation if the request exceeds free space. > > > + * > > > + * The loop body estimates how many blocks it can request from fdblocks > > > + * to stash in the reserve pool. This is a classic TOCTOU race since > > > + * fdblocks updates are not always coordinated via m_sb_lock. We also > > > + * cannot tell if @free remaining unchanged between iterations is due > > > + * to an idle system or freed blocks being consumed immediately, so > > > + * we'll try a finite number of times to satisfy the request. > > > */ > > > error = -ENOSPC; > > > - do { > > > + for (tries = 0; tries < 30 && error == -ENOSPC; tries++) { > > > free = percpu_counter_sum(&mp->m_fdblocks) - > > > xfs_fdblocks_unavailable(mp); > > > if (free <= 0) > > > @@ -459,7 +467,7 @@ xfs_reserve_blocks( > > > spin_unlock(&mp->m_sb_lock); > > > error = xfs_mod_fdblocks(mp, -fdblks_delta, 0); > > > spin_lock(&mp->m_sb_lock); > > > - } while (error == -ENOSPC); > > > + } > > > > So the problem here is that if we don't get all of the reservation, > > we get none of it, so we try again. This seems like we should simply > > punch the error back out to userspace and let it retry rather than > > try a few times and then fail anyway. Either way, userspace has to > > handle failure to reserve enough blocks. > > > > The other alternative here is that we just change the reserve block > > limit when ENOSPC occurs and allow the xfs_mod_fdblocks() refill > > code just fill up the pool as space is freed. That would greatly > > simplify the code - we do a single attempt to reserve the new space, > > and if it fails we don't care because the reserve space gets > > refilled before free space is made available again. I think that's > > preferable to having an arbitrary number of retries like this that's > > going to end up failing anyway. > > How about I try the following tomorrow? > > spin_lock(...); > mp->m_resblks = request; > free = percpu_counter_sum(&mp->m_fdblocks) - xfs_fdblocks_unavailable(mp); > if (request < mp->m_resblks_avail && free > 0) { > > ask = min(request - mp->m_resblks_avail, free); > > spin_unlock(...); > error = xfs_mod_fdblocks(mp, -ask, 0); > spin_lock(...); > > if (!error) > mp->m_resblks_avail += ask; > } > spin_unlock(...); > return error; Yeah, I think that's fine, though I think that to ensure things end up as correct as possible: spin_unlock(...); error = xfs_mod_fdblocks(mp, -ask, 0); if (!error) xfs_mod_fdblocks(mp, ask, 0); because if we are racing with other operations, the second xfs_mod_fdblocks() call will put as much of @ask as needed back on the resblks counter, and put any left over back on the percpu counter.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx