On Wed 16-11-22 21:14:16, Baokun Li wrote: > On 2022/11/16 19:49, Jan Kara wrote: > > On Wed 16-11-22 15:28:01, Baokun Li wrote: > > > In commit 9a8c5b0d0615 ("ext4: update the backup superblock's at the end > > > of the online resize"), it is assumed that update_backups() only updates > > > backup superblocks, so each b_data is treated as a backupsuper block to > > > update its s_block_group_nr and s_checksum. However, update_backups() > > > also updates the backup group descriptors, which causes the backup group > > > descriptors to be corrupted. > > > > > > The above commit fixes the problem of invalid checksum of the backup > > > superblock. The root cause of this problem is that the checksum of > > > ext4_update_super() is not set correctly. This problem has been fixed > > > in the previous patch ("ext4: fix bad checksum after online resize"). > > > Therefore, roll back some modifications in the above commit. > > > > > > Fixes: 9a8c5b0d0615 ("ext4: update the backup superblock's at the end of the online resize") > > > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> > > So I agree commit 9a8c5b0d0615 is broken and does corrupt group > > descriptors. However I don't see how PATCH 1/3 in this series would fix all > > the problems commit 9a8c5b0d0615 is trying to fix. In particular checksums > > on backup superblocks will not be properly set by the resize code AFAICT. > > > > Honza > I didn't find these two issues to be the same until I researched the problem > in > PATCH 3/3 and found that commit 9a8c5b0d0615 introduced a similar problem. > Then, it is found that the backup superblock is directly copied from the > primary > superblock. If the backup superblock is faulty, the primary superblock must > be > faulty. In this case, patch 1 that fixes the primary superblock problem is > thought > of. So by rolling back commit 9a8c5b0d0615 to verify, I found that patch 1 > did > fix the problem. > > Only ext4_flex_group_add() and ext4_group_extend_no_check() call > update_backups() to update the backup superblock. Both of these functions > correctly set the checksum of the primary superblock. The backup superblocks > that are copied from them are also correct. > > In ext4_flex_group_add(), we only update the backup superblock if there are > no > previous errors, indicating that we must have updated the checksum in > ext4_update_super() before executing update_backups(). The previous problem > was that after we updated the checksum in ext4_update_super(), we modified > s_overhead_clusters, so the checksums for both the primary and backup > superblocks > were incorrect. This problem has been fixed in PATCH 1/3, so checksum is set > correctly in ext4_flex_group_add(). > > The same is true in ext4_group_extend_no_check(), we only update the backup > superblock if there are no errors, and we execute ext4_superblock_csum_set() > to update the checksum before updating the backup superblock. Therefore, > checksum is correctly set in ext4_group_extend_no_check(). > > I think we only need to ensure that the checksum is set correctly when the > buffer > lock of sbi->s_sbh is unlocked. Therefore, the checksum should be correct > before > update_backups() holds the buffer lock. Also, in update_backups() we copy > the > entire superblock completely, and the checksum is unchanged, so we don't > need > to reset it. So I agree the checksum should be matching but the backup superblock should have also s_block_group_nr set properly and after updating that we need to recalculate the checksum as well. Honza > > > --- > > > fs/ext4/resize.c | 5 ----- > > > 1 file changed, 5 deletions(-) > > > > > > diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c > > > index cb99b410c9fa..32fbfc173571 100644 > > > --- a/fs/ext4/resize.c > > > +++ b/fs/ext4/resize.c > > > @@ -1158,7 +1158,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data, > > > while (group < sbi->s_groups_count) { > > > struct buffer_head *bh; > > > ext4_fsblk_t backup_block; > > > - struct ext4_super_block *es; > > > /* Out of journal space, and can't get more - abort - so sad */ > > > err = ext4_resize_ensure_credits_batch(handle, 1); > > > @@ -1187,10 +1186,6 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data, > > > memcpy(bh->b_data, data, size); > > > if (rest) > > > memset(bh->b_data + size, 0, rest); > > > - es = (struct ext4_super_block *) bh->b_data; > > > - es->s_block_group_nr = cpu_to_le16(group); > > > - if (ext4_has_metadata_csum(sb)) > > > - es->s_checksum = ext4_superblock_csum(sb, es); > > > set_buffer_uptodate(bh); > > > unlock_buffer(bh); > > > err = ext4_handle_dirty_metadata(handle, NULL, bh); > > > -- > > > 2.31.1 > > > > Thank you for your review! > -- > With Best Regards, > Baokun Li > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR