Re: [RFC] Filesystem error notifications proposal

Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx> · Mon, 08 Feb 2021 13:49:41 -0500

"Theodore Ts'o" <tytso@xxxxxxx> writes:

> On Tue, Feb 02, 2021 at 03:26:35PM -0500, Gabriel Krisman Bertazi wrote:
>> 
>> Thanks for the explanation.  That makes sense to me.  For corruptions
>> where it is impossible to map to a mountpoint, I thought they could be
>> considered global filesystem errors, being exposed only to someone
>> watching the entire filesystem (like FAN_MARK_FILESYSTEM).
>
> At least for ext4, there are only 3 ext4_error_*() that we could map
> to a subtree without having to make changes to the call points:
>
> % grep -i ext4_error_file\( fs/ext4/*.c  | wc -l
> 3
> % grep -i ext4_error_inode\( fs/ext4/*.c  | wc -l
> 79
> % grep -i ext4_error\( fs/ext4/*.c  | wc -l
> 42
>
> So in practice, unless we want to make a lot of changes to ext4, most
> of them will be global file system errors....
>
>> But, as you mentioned regarding the google use case, the entire idea of
>> watching a subtree is a bit beyond the scope of my use-case, and was
>> only added given the feedback on the previous proposal of this feature.
>> While nice to have, I don't have the need to watch different mountpoints
>> for errors, only the entire filesystem.
>
> I suspect that for most use cases, the most interesting thing is the
> first error.  We already record this in the ext4 superblock, because
> unfortunately, I can't guarantee that system administrators have
> correctly configured their system logs, so when handling upstream bug
> reports, I can just ask them to run dumpe2fs -h on the file system:
>
> FS Error count:           2
> First error time:         Tue Feb  2 16:27:42 2021
> First error function:     ext4_lookup
> First error line #:       1704
> First error inode #:      12
> First error err:          EFSCORRUPTED
> Last error time:          Tue Feb  2 16:27:59 2021
> Last error function:      ext4_lookup
> Last error line #:        1704
> Last error inode #:       12
> Last error err:           EFSCORRUPTED
>
> So it's not just the Google case.  I'd argue for most system
> administrator, one of the most useful things is when the file system
> was first found to be corrupted, so they can try correlating file
> system corruptions, with, say, reports of I/O errors, or OOM kils,
> etc.  This can also be useful for correlating the start of file system
> problems with problems at the application layer --- say, MongoDB,
> MySQL, etc.
>
> The reason why a notification system useful is because if you are
> using database some kind of high-availability replication system, and
> if there are problems detected in the file system of the primary MySQL
> server, you'd want to have the system fail over to the secondary MySQL
> server.  Sure, you *could* do this by polling the superblock, but
> that's not the most efficient way to do things.

Hi Ted,

I think this closes a full circle back to my original proposal.  It
doesn't have the complexities of objects other than superblock
notifications, doesn't require allocations.  I sent an RFC for that a
while ago [1] which resulted in this discussion and the current
implementation.

For the sake of a having a proposal and a way to move forward, I'm not
sure what would be the next step here.  I could revive the previous
implementation, addressing some issues like avoiding the superblock
name, the way we refer to blocks and using CAP_SYS_ADMIN.  I think that
implementation solves the usecase you explained with more simplicity.
But I'm not sure Darrick and Dave (all in cc) will be convinced by this
approach of global pipe where we send messages for the entire
filesystem, as Dave described it in the previous implementation.

Are you familiar with that implementation?

[1] https://www.spinics.net/lists/linux-ext4/msg75742.html

-- 
Gabriel Krisman Bertazi