Theodore Y. Ts'o wrote on 2020/12/9 12:34: > On Fri, Dec 04, 2020 at 09:26:49AM +0800, brookxu wrote: >> >> Theodore Y. Ts'o wrote on 2020/12/3 23:08: >>> On Sat, Nov 07, 2020 at 11:58:14PM +0800, Chunguang Xu wrote: >>>> From: Chunguang Xu <brookxu@xxxxxxxxxxx> >>>> >>>> In order to avoid poor search efficiency of system_zone, the >>>> system only adds metadata of some sparse group to system_zone. >>>> In the meta_bg scenario, the non-sparse group may contain gdt >>>> blocks. Perhaps we should add these blocks to system_zone to >>>> improve fault tolerance without significantly reducing system >>>> performance. >> >> Thanks, in the large-market scenario, if we deal with all groups, >> the system_zone will be very large, which may reduce performance. >> I think the previous method is good, but it needs to be changed >> slightly, so that the fault tolerance in the meta_bg scenario >> can be improved without the risk of performance degradation. > > OK, I see. But this is not actually reliable: > >>>> + if ((i < 5) || ((i % flex_size) == 0)) { > > This only works if the flex_size is less than or equal to 64 (assuming > a 4k blocksize). That's because on 64-bit file systems, we can fit 64 > block group descripters in a 4k block group descriptor block, so > that's the size of the meta_bg. The default flex_bg size is 16, but > it's quite possible to create a file system via "mke2fs -t ext4 -G > 256". In that case, the flex_size will be 256, and we would not be > including all of the meta_bg groups. So i % flex_size needs to be > replaced by "i % meta_bg_size", where meta_bg_size would be > initialized to EXT4_DESC_PER_BLOCK(sb). > > Does that make sense? Maybe I missed something. If i% meta_bg_size is used instead, if flex_size <64, then we will miss some flex_bg. There seems to be a contradiction here. In the scenario where only flex_bg is enabled, it may not be appropriate to use meta_bg_size. In the scenario where only meta_bg is enabled, it may not be appropriate to use flex_size. As you said before, it maybe better to remove if ((i <5) || ((i% flex_size) == 0)) and do it for all groups. In this way we won't miss some flex_bg, meta_bg, and sparse_bg. I tested it on an 80T disk and found that the performance loss was small: unpatched kernel: ext4_setup_system_zone() takes 524ms, mount-3137 [006] .... 89.548026: ext4_setup_system_zone: (ext4_setup_system_zone+0x0/0x3f0) mount-3137 [006] d... 90.072895: ext4_setup_system_zone_1: (ext4_fill_super+0x2057/0x39b0 <- ext4_setup_system_zone) patched kernel: ext4_setup_system_zone() takes 552ms, mount-4425 [006] .... 402.555793: ext4_setup_system_zone: (ext4_setup_system_zone+0x0/0x3d0) mount-4425 [006] d... 403.107307: ext4_setup_system_zone_1: (ext4_fill_super+0x2057/0x39b0 <- ext4_setup_system_zone) > > - Ted >