Re: [PATCH] e2fsck: delay quotas loading in release_orphan_inodes()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/8/25 20:50, Jan Kara wrote:
On Fri 25-08-23 10:08:17, Baokun Li wrote:
On 2023/8/25 3:37, Jan Kara wrote:
On Thu 24-08-23 20:56:14, Baokun Li wrote:
On 2023/8/24 18:08, Jan Kara wrote:
On Thu 24-08-23 10:27:46, Baokun Li wrote:
Hello, Jan!

On 2023/8/24 1:05, Jan Kara wrote:
On Thu 17-08-23 16:18:28, Baokun Li wrote:
After 7d79b40b ("e2fsck: adjust quota counters when clearing orphaned
inodes"), we load all the quotas before we process the orphaned inodes,
and when we load the quotas, we check the checsum of the bbitmap for each
group. If one of the bbitmap checksums is wrong, the following error will
be reported:

“Error initializing quota context in support library:
     Block bitmap checksum does not match bitmap”

But loading quotas comes before checking the current superblock for the
EXT2_ERROR_FS flag, which makes it impossible to use e2fsck to repair any
image that contains orphan inodes and has the wrong bbitmap checksum.
So delaying quota loading until after the EXT2_ERROR_FS judgment avoids
the above problem.

Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>
This certainly looks better but I wonder if there still isn't a problem if
the bitmap checksums are wrong but EXT2_ERROR_FS is not set. Shouldn't we
rather move the initialization of the quota files after the call to
e2fsck_read_bitmaps()?

								Honza
When the bitmap checksums are wrong but EXT2_ERROR_FS is not set, we must
have lost some data (error flag or group descriptor or bitmap), so there
is something wrong with the kernel at this time, so I don't think we
should fix the image directly, but rather let the user realize that
something is wrong with the filesystem logic.
I agree it means there is a problem somewhere (the storage, the kernel, or
similar). But just ignoring bitmap checksums in release_orphan_inodes() is
exactly how e2fsck behaves on filesystems without quota feature so I see no
reason for quota feature to change that because the inconsistency has
nothing to do with quotas...

Moreover, if we don't care how this happened, but just want to fix the
image, we only need to run "e2fsck -a" twice. After merging in the
current patch, we always empty the orphan list before loading the quotas,
and EXT2_ERROR_FS is set when loading the quotas fails, so this will be
fixed the second time you run e2fsck. It will not happen that every
e2fsck will fail like it did before.
I see, you're right so it isn't as bad as I originally thought but still my
argument above holds - IMO e2fsck should treat wrong bitmap checksums the
same way with and without the quota feature.

								Honza
The original flow that went wrong here is as follows:
e2fsck
   e2fsck_run_ext3_journal
   check_super_block
    release_orphan_inodes
     e2fsck_read_all_quotas
      quota_read_all_dquots
       quota_file_open
        ext2fs_read_bitmaps
         ext2fs_rw_bitmaps
          read_bitmaps_range
           read_bitmaps_range_start
            ext2fs_block_bitmap_csum_verify
             !!! error
   e2fsck_run

Yes, the inconsistency has nothing to do with quota, but quota is loaded
here to keep track of space changes during the normal processing of
orphan list. If quota was not loaded, we would not have read and check
bitmaps until Pass5, and we had already done a lot of checking and
tweaking of inodes, blocks, and dirs before Pass5, and the bitmaps
inconsistency may have been fixed during that time.
This is not true. release_orphan_inodes() calls e2fsck_read_bitmaps() which
loads all the bitmaps while ignoring checksum failures. This is needed so
that blocks released during orphan cleanup are properly tracked as free.
All I want to do is to move the call to e2fsck_read_all_quotas() a bit
further than you moved it to a place after the e2fsck_read_bitmaps()
call...
Yes, e2fsck_read_bitmaps() ignores checksum errors for reading bitmaps,
which prevents us from exiting e2fsck due to checksum error in
release_orphan_inodes(), but in the case of the previously mentioned
checksum error but EXT2_ERROR_FS is not set, when we execute "e2fsck -a",
since checksum is ignored, the filesystem is considered clean, so it
exits e2fsck without performing a force check, but the error is still
there.
Yes, and I believe that is a correct behavior because "e2fsck -a" means
"don't check the filesystem unless it is required" - i.e., too long since the
last check, too many mounts, or errors detected state. And if the
filesystem doesn't have the quota feature, this is indeed what is going to
happen. We'll happily skip the filesystem with bitmap checksum errors. So
why should we complain about it when quota feature is enabled?

If you think bitmap checksums should be checked by e2fsck -a, then we can
have that discussion (separate patch from your quota fixup) but then it
should happen regardless of the quota feature. Because doing it only with
quota feature enabled is really unexpected and is going to confuse users.

								Honza

Got it!

We didn't care about bitmap checksum errors before Pass5, so it's as expected

to ignore them!  Thank you for your patient explanation!

I'll update the patch with your suggestion in the next version!


Thanks!
--
With Best Regards,
Baokun Li
.



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux