Re: [PATCH RESEND 4/8] ext4: add the gdt block of meta_bg to system_zone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Theodore Y. Ts'o wrote on 2020/12/10 3:39:
> On Wed, Dec 09, 2020 at 07:48:09PM +0800, brookxu wrote:
>>
>> Maybe I missed something. If i% meta_bg_size is used instead, if
>> flex_size <64, then we will miss some flex_bg. There seems to be
>> a contradiction here. In the scenario where only flex_bg is
>> enabled, it may not be appropriate to use meta_bg_size. In the
>> scenario where only meta_bg is enabled, it may not be appropriate
>> to use flex_size.
>>
>> As you said before, it maybe better to remove
>>
>> 	if ((i <5) || ((i% flex_size) == 0))
>>
>> and do it for all groups.
> 
> I don't think the original (i % flex_size) made any sense in the first
> place.
> 
> What flex_bg does is that it collects the allocation bitmaps and inode
> tables for each block group and locates them within the first block
> group in a flex_bg.  It doesn't have anything to do with whether or
> not a particular block group has a backup copy of the superblock and
> block group descriptor table --- in non-meta_bg file systems and the
> meta_bg file systems where the block group is less than
> s_first_meta_bg * EXT4_DESC_PER_BLOCK(sb).  And the condition in
> question is only about whether or not to add the backup superblock and
> backup block group descriptors.  So checking for i % flex_size made no
> sense, and I'm not sure that check was there in the first place.

I think we should add backup sb and gdt to system_zone, because
these blocks should not be used by applications. In fact, I
think we may have done some work.

>> In this way weh won't miss some flex_bg, meta_bg, and sparse_bg.
>> I tested it on an 80T disk and found that the performance loss
>> was small:
>>
>>  unpatched kernel:
>>  ext4_setup_system_zone() takes 524ms, 
>>
>>  patched kernel:
>>  ext4_setup_system_zone() takes 552ms, 
> 
> I don't really care that much about the time it takes to execute
> ext4_setup_system_zone().
> 
> The really interesting question is how large is the rb_tree
> constructed by that function, and what is the percentage increase of
> time that the ext4_inode_block_valid() function takes.  (e.g., how
> much additional memory is the system_blks tree taking, and how deep is
> that tree, since ext4_inode_block_valid() gets called every time we
> allocate or free a block, and every time we need to validate an extent
> tree node.

During detailed analysis, I found that when the current logic
calls ext4_setup_system_zone(), s_log_groups_per_flex has not
been initialized, and flex_size is always 1, which seems to
be a mistake. therefore

if (ext4_bg_has_super(sb, i) &&
                    ((i <5) || ((i% flex_size) == 0)))

Degenerate to

if (ext4_bg_has_super(sb, i))

So, the existing implementation just adds the backup super
block in sparse_group to system_zone. Due to this mistake,
the behavior of the system in the flex_bg scenario happens to
be correct?

I tested it in three scenarios: only meta_bg, only flex_bg,
both flex_bg and meta_bg were enabled. The test results are as
follows:

Meta_bg only
 unpacthed kernel:
 ext4_setup_system_zone time 866 count 1309087
 
 pacthed kernel:
 ext4_setup_system_zone time 841 count 1309087

Since the backup gdt of meta_bg and BB are connected, they can
be merged, so no additional nodes are added.

Flex_bg only
 unpacthed kernel:
 ext4_setup_system_zone time 529 count 41016

 pacthed kernel:
 ext4_setup_system_zone time 553 count 41016

The system behavior has not changed. All sparse_group backup sb
and gdt are still added, so no additional nodes are added.

Meta_bg & Flex_bg only
 unpacthed kernel:
 ext4_setup_system_zone time 535 count 41016
 
 pacthed kernel:
 ext4_setup_system_zone time 571 count 61508

In addition to sparse_group, the system needs to add the backup
gdt of meta_bg to the system. Set

	N=max(flex_bg_size / meta_bg_size, 1)

then every N meta_bg has a gdt block that can be merged into 
the node corresponding to flex_bg, such as flex_bg_size < meta_bg_size,
then the number of new nodes is 2 * nr_meta_bg. On this 80T
disk, the maximum depth of rbtree is 2log(n+1). According to
this calculation, in this test case, the depth of rbtree is
not increased. Thus, there is no major performance overhead.

Maybe we can deal with it in the same way as discussed before?

> Cheers,
> 
> 						- Ted
> 



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux