On Fri, Aug 18, 2023 at 11:16:35AM +0800, Kemeng Shi wrote: > Ah, I guess here is the thing I missed that make this confusing: > sbi->s_group_desc contains only primary block of each group. i.e. > sbi->s_group_desc['i'] is the primary gdb block of group 'i'. Correct. In fact, when we need to modify a block group descriptor for a group, we call ext4_get_group_desc(), and it references sbi->s_group_desc to find the appropriate buffer_head for the bg descriptor that we want to modify. I'm not sure "only" is the right adjective to use here, since the whole *point* of s_group_desc[] is to keep the buffer_heads for the block group descriptor blocks in memory, so we can modify them when we allocate or free blocks, inodes, etc. And we only modify the primary block group descriptors. The secondary, or backup block group descriptors are only by used e2fsck when the primary block group descriptor has been overwritten, so we can find the inode table, allocation bitmaps, and so on. So we do *not* modify them in the course of normal operations, and that's by design. The only time the kernel will update those block group descriptors is when we are doing an online resize, and we need make sure the backup descriptors are updated, so that if the primary descriptors get completely smashed, we can still get critical information such as the location of that block group's inode table. > From add_new_gdb and add_new_gdb_meta_bg, we can find that we always > add primary gdb block of new group to s_group_desc. To be more specific: > add_new_gdb > gdblock = EXT4_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num; > gdb_bh = ext4_sb_bread(sb, gdblock, 0); > n_group_desc[gdb_num] = gdb_bh; > > add_new_gdb_meta_bg > gdblock = ext4_meta_bg_first_block_no(sb, group) + > ext4_bg_has_super(sb, group); > gdb_bh = ext4_sb_bread(sb, gdblock, 0); > n_group_desc[gdb_num] = gdb_bh; Put another way, there are EXT4_DESC_PER_BLOCK(sb) bg descriptors in a block. For a file system with the 64-bit feature enabled, the size of the block group descriptor is 128. If the block size is 4096, then we can fit 32 block group descriptors in a block. When we add a new block group such that its block group descriptor will spill into a new block, then we need to expand s_group_desc[] array, and initialize the new block group descriptor block. And that's the job of add_new_gdb() and add_new_gdb_meta_bg(). > > More generally, this whole patch series is making it clear that the > > online resize code is hard to understand, because it's super > > complicated, leading to potential bugs, and very clearly code which is > > very hard to maintain. So this may mean we need better comments to > > make it clear *when* the backup block groups are going to be > > accomplished, given the various different cases (e.g., no flex_bg or > > meta_bg, with flex_bg, or with meat_bg). > > > Yes, I agree with this. I wonder if a series to add comments of some > common rules is good to you. Like the information mentioned above > that s_group_desc contains primary gdb block of each group. Well, the meaning of s_group_desc[] was obvious to me, but that's why it's useful to have somone with "new eyes" take a look at code, since what may be obvious to old hands might not be obvious to someone looking at the code for the first time --- and so yes, it's probably worth documenting. The question is where is the best place to put it, since the primary place where s_group_desc[] is *not* online resize. s_group_desc[] is initialized in ext4_group_desc_init() in fs/ext4/super.c, and it is used in fs/ext4/balloc.c, and of course, it is defined in fs/ext4.h. - Ted