On Fri, Nov 27, 2020 at 12:34:00PM +0100, Jan Kara wrote: > When filesystem inconsistency is detected with group locked, we > currently try to modify superblock to store error there without > blocking. However this can cause superblock checksum failures (or > DIF/DIX failure) when the superblock is just being written out. > > Make error handling code just store error information in ext4_sb_info > structure and copy it to on-disk superblock only in ext4_commit_super(). > In case of error happening with group locked, we just postpone the > superblock flushing to a workqueue. > > Signed-off-by: Jan Kara <jack@xxxxxxx> So this patch does make a behavioral change, which is if a file system contains errors when it is mounted, when the kernel trips over more file system problems, the first error is overwritten. Before, s_first_error_* used to mean the first error found since the file system was checked. With this patch, s_first_error_* now means the first error found since the file system was mounted. This distinction is critical, because there are some buggy distro's (Ubuntu and Debian) out there where their cloud image does *not* run fsck on boot. So if a file system is corrupted, it does not get fixed up, and file system can get more and more damaged. In that case, it's good to know when the file system was first damaged, even if it was six months earlier and several reboots and remounts later. :-/ We should be able to fix up this patch by making commit_super only update the on-disk s_first_error* fields if s_first_error_time on disk is 0. - Ted P.S. Bugs have been filed with both distro's. Ubuntu has accepted it is a bug, but we're still working on convincing the Debian cloud image devs.... And it's not just the buggy cloud iamges; it's also happened on occasion that sloppy embedded Linux developers or sysadmins have misconfigured their system so that fsck never gets run, and it's nice to be able to have the forensic data preserved.