On 2020/10/5 下午9:05, Josef Bacik wrote: > On 9/30/20 8:01 AM, Qu Wenruo wrote: >> [BUG] >> There are quite some bug reports of btrfs falling into a ENOSPC trap, >> where btrfs can't even start a transaction to add new devices. >> >> [CAUSE] >> Most of the reports are utilize multi-device profiles, like >> RAID1/RAID10/RAID5/RAID6, and the involved disks have very unbalanced >> sizes. >> >> It turns out that, the overcommit calculation in btrfs_can_overcommit() >> is just a factor based calculation, which can't check if devices can >> really fulfill the requirement for the desired profile. >> >> This makes btrfs_can_overcommit() to be always over-confident about >> usable space, and when we can't allocate any new metadata chunk but >> still allow new metadata operations, we fall into the ENOSPC trap and >> have no way to exit it. >> >> [WORKAROUND] >> The root fix needs a device layout aware, chunk allocator like available >> space calculation. >> >> There used to be such patchset submitted to the mail list, but the extra >> failure mode is tricky to handle for chunk allocation, thus that >> patchset needs more time to mature. >> >> Meanwhile to prevent such problems reaching more users, workaround the >> problem by: >> - Half the over-commit available space reported >> So that we won't always be that over-confident. >> But this won't really help if we have extremely unbalanced disk size. >> >> - Don't over-commit if the space info is already full >> This may already be too late, but still better than doing nothing and >> believe the over-commit values. >> > > I just had a thought, what if we simply cap the free_chunk_space to the > min of the free space of all the devices. Sure, reducing the number will never be a problem. > Simply walk through all the > devices on mount, and we do the initial set of whatever the smallest one > is. The rest of the math would work out fine, and the rest of the > modifications would work fine. But I still prefer to do the minimal device size update at the timing of my per-profile available space, so we don't have any chance to over-estimate. > The only "tricky" part would be when we > do a shrink or grow, we'd have to re-calculate the sizes for everybody, > but that's not a big deal. Thanks, As long as we don't over-estimate, everything will be fine, just how many extra metadata flushing is needed (thus extra overhead). The rest is just a spectrum between "I don't really like over-commit at all and let's make it really hard to do any overcommit" and "I'm a super smart guy and here is the best algorithm to estimate how many space we really have for over-commit". Thanks, Qu > > Josef >
Attachment:
signature.asc
Description: OpenPGP digital signature