Re: [PATCH] btrfs: workaround the over-confident over-commit available space calculation

Qu Wenruo <wqu@xxxxxxxx> · Mon, 5 Oct 2020 17:22:37 +0800

Hi David,

Would you please consider merge this patch as a hotfix?

We have more and more reports about deadly ENOSPC trap for multi-device
setup.

Considering the worst consequence, user can't even delete anything due
to exhausted metadata, I really hope we can at least workaround it.

The side effect of the patch is, smaller metadata over-commit, which may
decrease the performance, but I see it worthy to avoid the worst case
scenario.

And buy enough time for us to review the per-profile available space patch.

Thanks,
Qu

On 2020/9/30 下午8:01, Qu Wenruo wrote:
> [BUG]
> There are quite some bug reports of btrfs falling into a ENOSPC trap,
> where btrfs can't even start a transaction to add new devices.
> 
> [CAUSE]
> Most of the reports are utilize multi-device profiles, like
> RAID1/RAID10/RAID5/RAID6, and the involved disks have very unbalanced
> sizes.
> 
> It turns out that, the overcommit calculation in btrfs_can_overcommit()
> is just a factor based calculation, which can't check if devices can
> really fulfill the requirement for the desired profile.
> 
> This makes btrfs_can_overcommit() to be always over-confident about
> usable space, and when we can't allocate any new metadata chunk but
> still allow new metadata operations, we fall into the ENOSPC trap and
> have no way to exit it.
> 
> [WORKAROUND]
> The root fix needs a device layout aware, chunk allocator like available
> space calculation.
> 
> There used to be such patchset submitted to the mail list, but the extra
> failure mode is tricky to handle for chunk allocation, thus that
> patchset needs more time to mature.
> 
> Meanwhile to prevent such problems reaching more users, workaround the
> problem by:
> - Half the over-commit available space reported
>   So that we won't always be that over-confident.
>   But this won't really help if we have extremely unbalanced disk size.
> 
> - Don't over-commit if the space info is already full
>   This may already be too late, but still better than doing nothing and
>   believe the over-commit values.
> 
> CC: stable@xxxxxxxxxxxxxxx # 4.4+
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
>  fs/btrfs/space-info.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index 475968ccbd1d..e8133ec7e34a 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -339,6 +339,18 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info,
>  		avail >>= 3;
>  	else
>  		avail >>= 1;
> +	/*
> +	 * Since current over-commit calculation is doomed already for
> +	 * RAID0/RADI1/RAID10/RAID5/6, we half the availabe space to reduce
> +	 * over-commit amount.
> +	 *
> +	 * This is just a workaround before the device layout aware
> +	 * available space calculation arrives.
> +	 */
> +	if ((BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1_MASK |
> +	     BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID56_MASK) &
> +	     profile)
> +		avail >>= 1;
>  	return avail;
>  }
>  
> @@ -353,6 +365,14 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info,
>  	if (space_info->flags & BTRFS_BLOCK_GROUP_DATA)
>  		return 0;
>  
> +	/*
> +	 * If we can't allocate new space already, no overcommit is allowed.
> +	 *
> +	 * This check may be already late, but still better than nothing.
> +	 */
> +	if (space_info->full)
> +		return 0;
> +
>  	used = btrfs_space_info_used(space_info, true);
>  	avail = calc_available_free_space(fs_info, space_info, flush);
>  
> 

Attachment:
signature.asc

Description: OpenPGP digital signature