On Fri, Jun 23, 2023 at 09:08:37PM +0530, Ritesh Harjani wrote: > It seems in the original code what we were trying to do was to preseve > the error information area of superblock across journal load (which I am > not sure why though?) > > In the new code we see if the journal load changed that area and if yes > we change that back to original log but we also marked changed = true. Why? That's a good question; thanks for asking it. The first part of this code was introduced by commit 1c13d5c08728 ("ext4: Save error information to the superblock for analysis") in 2010. So that part of the code is not "new", but very, very, old. The basic idea here was that back then, when a file system error was detected, it was always written directly to the superblock, by passing the journal. So that's why the original code saved the error information, replayed the journal and then restored it. Of course, this changed with commit 2d01ddc86606 ("ext4: save error info to sb through journal if available") in 2020. But the problem is "if available". If the jbd2 layer has shut down, then we can't route the superblock error updates through the journal, at which point ext4 will do a direct update of the superlbock. This was the rational behind commit eee00237fa5e ("ext4: commit super block if fs record error when journal record without error"). Sometimes the error bit EXT4_ERROR_FS is set via a direct write to the superblock; but other times the error bit might be set via the journal. In the former case, after we do a journal replay, the error bit will be cleared. However, since the kernel never clears the EXT4_ERROR_FS bit, it's pretty easy for commit eee00237fa5e to handle things. So what we need to for that first part of the code, introduced in commit 1c13d5c08728 and made invalid in commit 2d01ddc86606 is we need to add code to examine s_last_error_time. If the version that was originally in the superblock is newer than the version found after the journal replay, then presumably an error happened but ext4 wasn't able to write the error information out through the journal, and we need to replace it after the the call to jbd2_journal_load(). Cheers, - Ted