Hi David, Would you please consider merge this patch as a hotfix? We have more and more reports about deadly ENOSPC trap for multi-device setup. Considering the worst consequence, user can't even delete anything due to exhausted metadata, I really hope we can at least workaround it. The side effect of the patch is, smaller metadata over-commit, which may decrease the performance, but I see it worthy to avoid the worst case scenario. And buy enough time for us to review the per-profile available space patch. Thanks, Qu On 2020/9/30 下午8:01, Qu Wenruo wrote: > [BUG] > There are quite some bug reports of btrfs falling into a ENOSPC trap, > where btrfs can't even start a transaction to add new devices. > > [CAUSE] > Most of the reports are utilize multi-device profiles, like > RAID1/RAID10/RAID5/RAID6, and the involved disks have very unbalanced > sizes. > > It turns out that, the overcommit calculation in btrfs_can_overcommit() > is just a factor based calculation, which can't check if devices can > really fulfill the requirement for the desired profile. > > This makes btrfs_can_overcommit() to be always over-confident about > usable space, and when we can't allocate any new metadata chunk but > still allow new metadata operations, we fall into the ENOSPC trap and > have no way to exit it. > > [WORKAROUND] > The root fix needs a device layout aware, chunk allocator like available > space calculation. > > There used to be such patchset submitted to the mail list, but the extra > failure mode is tricky to handle for chunk allocation, thus that > patchset needs more time to mature. > > Meanwhile to prevent such problems reaching more users, workaround the > problem by: > - Half the over-commit available space reported > So that we won't always be that over-confident. > But this won't really help if we have extremely unbalanced disk size. > > - Don't over-commit if the space info is already full > This may already be too late, but still better than doing nothing and > believe the over-commit values. > > CC: stable@xxxxxxxxxxxxxxx # 4.4+ > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > --- > fs/btrfs/space-info.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index 475968ccbd1d..e8133ec7e34a 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -339,6 +339,18 @@ static u64 calc_available_free_space(struct btrfs_fs_info *fs_info, > avail >>= 3; > else > avail >>= 1; > + /* > + * Since current over-commit calculation is doomed already for > + * RAID0/RADI1/RAID10/RAID5/6, we half the availabe space to reduce > + * over-commit amount. > + * > + * This is just a workaround before the device layout aware > + * available space calculation arrives. > + */ > + if ((BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1_MASK | > + BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID56_MASK) & > + profile) > + avail >>= 1; > return avail; > } > > @@ -353,6 +365,14 @@ int btrfs_can_overcommit(struct btrfs_fs_info *fs_info, > if (space_info->flags & BTRFS_BLOCK_GROUP_DATA) > return 0; > > + /* > + * If we can't allocate new space already, no overcommit is allowed. > + * > + * This check may be already late, but still better than nothing. > + */ > + if (space_info->full) > + return 0; > + > used = btrfs_space_info_used(space_info, true); > avail = calc_available_free_space(fs_info, space_info, flush); > >
Attachment:
signature.asc
Description: OpenPGP digital signature