Re: RO mount of ext4 filesystem causes writes

"Theodore Ts'o" <tytso@xxxxxxx> · Sat, 24 Jun 2023 15:39:24 -0400

On Fri, Jun 23, 2023 at 09:08:37PM +0530, Ritesh Harjani wrote:
> It seems in the original code what we were trying to do was to preseve
> the error information area of superblock across journal load (which I am
> not sure why though?)
> 
> In the new code we see if the journal load changed that area and if yes
> we change that back to original log but we also marked changed = true. Why?

That's a good question; thanks for asking it.  The first part of this
code was introduced by commit 1c13d5c08728 ("ext4: Save error
information to the superblock for analysis") in 2010.  So that part of
the code is not "new", but very, very, old. 

The basic idea here was that back then, when a file system error was
detected, it was always written directly to the superblock, by passing
the journal.  So that's why the original code saved the error
information, replayed the journal and then restored it.

Of course, this changed with commit 2d01ddc86606 ("ext4: save error
info to sb through journal if available") in 2020.  But the problem is
"if available".  If the jbd2 layer has shut down, then we can't route
the superblock error updates through the journal, at which point ext4
will do a direct update of the superlbock.

This was the rational behind commit eee00237fa5e ("ext4: commit super
block if fs record error when journal record without error").
Sometimes the error bit EXT4_ERROR_FS is set via a direct write to the
superblock; but other times the error bit might be set via the
journal.  In the former case, after we do a journal replay, the error
bit will be cleared.  However, since the kernel never clears the
EXT4_ERROR_FS bit, it's pretty easy for commit eee00237fa5e to handle
things.

So what we need to for that first part of the code, introduced in
commit 1c13d5c08728 and made invalid in commit 2d01ddc86606 is we need
to add code to examine s_last_error_time.  If the version that was
originally in the superblock is newer than the version found after the
journal replay, then presumably an error happened but ext4 wasn't able
to write the error information out through the journal, and we need to
replace it after the the call to jbd2_journal_load().

Cheers,

						- Ted