On Thu 24-08-23 20:56:14, Baokun Li wrote: > On 2023/8/24 18:08, Jan Kara wrote: > > On Thu 24-08-23 10:27:46, Baokun Li wrote: > > > Hello, Jan! > > > > > > On 2023/8/24 1:05, Jan Kara wrote: > > > > On Thu 17-08-23 16:18:28, Baokun Li wrote: > > > > > After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned > > > > > inodes"), we load all the quotas before we process the orphaned inodes, > > > > > and when we load the quotas, we check the checsum of the bbitmap for each > > > > > group. If one of the bbitmap checksums is wrong, the following error will > > > > > be reported: > > > > > > > > > > “Error initializing quota context in support library: > > > > > Block bitmap checksum does not match bitmap” > > > > > > > > > > But loading quotas comes before checking the current superblock for the > > > > > EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any > > > > > image that contains orphan inodes and has the wrong bbitmap checksum. > > > > > So delaying quota loading until after the EXT2_ERROR_FS judgment avoids > > > > > the above problem. > > > > > > > > > > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> > > > > This certainly looks better but I wonder if there still isn't a problem if > > > > the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we > > > > rather move the initialization of the quota files after the call to > > > > e2fsck_read_bitmaps()? > > > > > > > > Honza > > > When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must > > > have lost some data (error flag or group descriptor or bitmap), so there > > > is something wrong with the kernel at this time, so I don't think we > > > should fix the image directly, but rather let the user realize that > > > something is wrong with the filesystem logic. > > I agree it means there is a problem somewhere (the storage, the kernel, or > > similar). But just ignoring bitmap checksums in release_orphan_inodes() is > > exactly how e2fsck behaves on filesystems without quota feature so I see no > > reason for quota feature to change that because the inconsistency has > > nothing to do with quotas... > > > > > Moreover, if we don't care how this happened, but just want to fix the > > > image, we only need to run "e2fsck -a" twice. After merging in the > > > current patch, we always empty the orphan list before loading the quotas, > > > and EXT2_ERROR_FS is set when loading the quotas fails, so this will be > > > fixed the second time you run e2fsck. It will not happen that every > > > e2fsck will fail like it did before. > > I see, you're right so it isn't as bad as I originally thought but still my > > argument above holds - IMO e2fsck should treat wrong bitmap checksums the > > same way with and without the quota feature. > > > > Honza > The original flow that went wrong here is as follows: > e2fsck > e2fsck_run_ext3_journal > check_super_block > release_orphan_inodes > e2fsck_read_all_quotas > quota_read_all_dquots > quota_file_open > ext2fs_read_bitmaps > ext2fs_rw_bitmaps > read_bitmaps_range > read_bitmaps_range_start > ext2fs_block_bitmap_csum_verify > !!! error > e2fsck_run > > Yes, the inconsistency has nothing to do with quota, but quota is loaded > here to keep track of space changes during the normal processing of > orphan list. If quota was not loaded, we would not have read and check > bitmaps until Pass5, and we had already done a lot of checking and > tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps > inconsistency may have been fixed during that time. This is not true. release_orphan_inodes() calls e2fsck_read_bitmaps() which loads all the bitmaps while ignoring checksum failures. This is needed so that blocks released during orphan cleanup are properly tracked as free. All I want to do is to move the call to e2fsck_read_all_quotas() a bit further than you moved it to a place after the e2fsck_read_bitmaps() call... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR