On Wed 25-05-22 20:46:12, Ritesh Harjani wrote: > On 22/05/25 01:54PM, Jan Kara wrote: > > On Wed 25-05-22 13:21:23, Ritesh Harjani wrote: > > > On 22/05/25 09:29AM, Ye Bin wrote: > > > > We got issue as follows: > > > > [home]# mount /dev/sda test > > > > EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended > > > > [home]# dmesg > > > > EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended > > > > EXT4-fs (sda): Errors on filesystem, clearing orphan list. > > > > EXT4-fs (sda): recovery complete > > > > EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none. > > > > [home]# debugfs /dev/sda > > > > debugfs 1.46.5 (30-Dec-2021) > > > > Checksum errors in superblock! Retrying... > > > > > > > > Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update > > > > super block checksum. > > > > To solve above issue, defer update super block checksum after ext4_orphan_cleanup. > > > > > > I agree with the analysis. However after [1], I think all updates to superblock > > > (including checksum computation) should be done within buffer lock. > > > (lock_buffer(), unlock_buffer()). > > > > > > [1]: https://lore.kernel.org/all/20201216101844.22917-4-jack@xxxxxxx/ > > > > So technically you're right that we should hold buffer lock all the time > > from before we modify superblock buffer until we recompute the checksum (so > > that we avoid writing superblock with mismatched checksum). To do this we'd > > have to put checksum recomputations and superblock buffer locking into > > ext4_orphan_cleanup() around setting of es->s_last_orphan (in three places > > there AFAICS). A bit tedious but it would actually also fix a (theoretical) > > race that someone decides to write out superblock after we set > > s_last_orphan but before we set the checksum. > > Ok. Although (I think) it can still be done at just one place before returning > from ext4_orphan_cleanup(). > But I agree it is mostly a theoretical race (in fact since this is happening > during mount, I am not sure if it is even possible?) and there might not > be any value addition in doing so by complicating it too much. Well, what could presumably happen is that if someone dirtied superblock (say while processing orphan list), then flush worker could come just after we set s_last_orphan and before we update the checksum. Now I don't think we currently dirty superblock during orphan cleanup but it is certainly slightly fragile to rely on this. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR