Re: [PATCH v9 38/41] btrfs: extend zoned allocator to use dedicated tree-log block group

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 03, 2020 at 03:47:33PM -0500, Josef Bacik wrote:
On 10/30/20 9:51 AM, Naohiro Aota wrote:
This is the 1/3 patch to enable tree log on ZONED mode.

The tree-log feature does not work on ZONED mode as is. Blocks for a
tree-log tree are allocated mixed with other metadata blocks, and btrfs
writes and syncs the tree-log blocks to devices at the time of fsync(),
which is different timing from a global transaction commit. As a result,
both writing tree-log blocks and writing other metadata blocks become
non-sequential writes that ZONED mode must avoid.

We can introduce a dedicated block group for tree-log blocks so that
tree-log blocks and other metadata blocks can be separated write streams.
As a result, each write stream can now be written to devices separately.
"fs_info->treelog_bg" tracks the dedicated block group and btrfs assign
"treelog_bg" on-demand on tree-log block allocation time.

This commit extends the zoned block allocator to use the block group.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>

If you're going to remove an entire block group from being allowed to be used for metadata you are going to need to account for it in the space_info, otherwise we're going to end up with nasty ENOSPC corners here.

Indeed. I'll add a dedicated space_info for treelog or, at least, separate
the block group from other metadata space_info. But, I'll address this
later in v11.


But this begs the question, do we want the tree log for zoned? We could just commit the transaction and call it good enough. We lose performance, but zoned isn't necessarily about performance.

We have a large performance drop without tree-log (-o notreelog). Here is a
dbench (32 clients) result on SMR HDD.

With treelog:    153.509  MB/s	
Without treelog:  21.9651 MB/s

So, there is 85% drop of the throughput. I think this degradation is too large.


If we do then at a minimum we're going to need to remove this block group from the space info counters for metadata. Thanks,

Josef



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux