On 12/12/19 11:08 PM, Naohiro Aota wrote:
On HMZONED drives, writes must always be sequential and directed at a block group zone write pointer position. Thus, block allocation in a block group must also be done sequentially using an allocation pointer equal to the block group zone write pointer plus the number of blocks allocated but not yet written. Sequential allocation function find_free_extent_zoned() bypass the checks in find_free_extent() and increase the reserved byte counter by itself. It is impossible to revert once allocated region in the sequential allocation, since it might race with other allocations and leave an allocation hole, which breaks the sequential write rule. Furthermore, this commit introduce two new variable to struct btrfs_block_group. "wp_broken" indicate that write pointer is broken (e.g. not synced on a RAID1 block group) and mark that block group read only. "zone_unusable" keeps track of the size of once allocated then freed region in a block group. Such region is never usable until resetting underlying zones. This commit also introduce "bytes_zone_unusable" to track such unusable bytes in a space_info. Pinned bytes are always reclaimed to "bytes_zone_unusable". They are not usable until resetting them first.
Please separate this out into it's own patch, these things are a bear as it is to review, it doesn't help that I need to keep track of two different things per patch. Thanks,
Josef