Re: [PATCH 4/6] xfs: fix infinite loop when reserving free block pool

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 24 Mar 2022 08:11:47 +1100

On Sun, Mar 20, 2022 at 09:43:49AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Don't spin in an infinite loop trying to reserve blocks -- if we can't
> do it after 30 tries, we're racing with a nearly full filesystem, so
> just give up.
> 
> Cc: Brian Foster <bfoster@xxxxxxxxxx>
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_fsops.c |   12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 710e857bb825..615334e4f689 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -379,6 +379,7 @@ xfs_reserve_blocks(
>  	int64_t			fdblks_delta = 0;
>  	uint64_t		request;
>  	int64_t			free;
> +	unsigned int		tries;
>  	int			error = 0;
>  
>  	/* If inval is null, report current values and return */
> @@ -430,9 +431,16 @@ xfs_reserve_blocks(
>  	 * If the request is larger than the current reservation, reserve the
>  	 * blocks before we update the reserve counters. Sample m_fdblocks and
>  	 * perform a partial reservation if the request exceeds free space.
> +	 *
> +	 * The loop body estimates how many blocks it can request from fdblocks
> +	 * to stash in the reserve pool.  This is a classic TOCTOU race since
> +	 * fdblocks updates are not always coordinated via m_sb_lock.  We also
> +	 * cannot tell if @free remaining unchanged between iterations is due
> +	 * to an idle system or freed blocks being consumed immediately, so
> +	 * we'll try a finite number of times to satisfy the request.
>  	 */
>  	error = -ENOSPC;
> -	do {
> +	for (tries = 0; tries < 30 && error == -ENOSPC; tries++) {
>  		free = percpu_counter_sum(&mp->m_fdblocks) -
>  						xfs_fdblocks_unavailable(mp);
>  		if (free <= 0)
> @@ -459,7 +467,7 @@ xfs_reserve_blocks(
>  		spin_unlock(&mp->m_sb_lock);
>  		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
>  		spin_lock(&mp->m_sb_lock);
> -	} while (error == -ENOSPC);
> +	}

So the problem here is that if we don't get all of the reservation,
we get none of it, so we try again. This seems like we should simply
punch the error back out to userspace and let it retry rather than
try a few times and then fail anyway. Either way, userspace has to
handle failure to reserve enough blocks.

The other alternative here is that we just change the reserve block
limit when ENOSPC occurs and allow the xfs_mod_fdblocks() refill
code just fill up the pool as space is freed. That would greatly
simplify the code - we do a single attempt to reserve the new space,
and if it fails we don't care because the reserve space gets
refilled before free space is made available again.  I think that's
preferable to having an arbitrary number of retries like this that's
going to end up failing anyway.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx