Re: [syzbot] [ext4?] BUG: sleeping function called from invalid context in ext4_update_super

"Theodore Ts'o" <tytso@xxxxxxx> · Sat, 10 Jun 2023 23:20:32 -0400

(Dropping linux-fsdevel and linux-kernel from the cc list.)

On Sat, Jun 10, 2023 at 10:41:18PM +0200, Fabio M. De Francesco wrote:
> Well, I'm a new to filesystems. However, I'd like to test a change in 
> ext4_handle_error().
> 
> Currently I see that errors are handled according to the next snippet of code 
> from the above-mentioned function (please note that we are in atomic context):
> 
> 	if (continue_fs && journal)
> 		schedule_work(&EXT4_SB(sb)->s_error_work);
> 	else
> 		ext4_commit_super(sb);
> 
> If evaluates false, we directly call ext4_commit_super(), forgetting that, 
> AFAICS we are in atomic context.
> 
> As I said I have only little experience with filesystems, so my question is: 
> despite the overhead, can we delete the check and do the following?
>
> [ Unconditionally call schedule_work(&EXT4_SB(sb)->s_error_work) ]

That doesn't work, for the simple reason that it's possible that file
system might be configured to immediately panic on an error.  (See
later in the ext4_handle_error() function after the check for
test_opt(sb, ERRORS_PANIC).  If that happens, the workqueue will never
have a chance to run.  In that case, we have to call
ext4_commit_super().

The real answer here is that ext4_error() must never be called from an
atomic context, and a recent commit 5354b2af3406 ("ext4: allow
ext4_get_group_info() to fail") added a call to ext4_error() which is
problematic since some callers of the ext4_get_group_info() function
may be holding a spinlock.  And so the best solution is to just simply
to drop the call to ext4_error(), since it's not strictly necessary.
If there is an antagonist process which is actively corrupting the
superblock, some other code path will report the fact that the file
system is corrupted soon enough.

						- Ted

P.S.  There is an exception to what I've described above, and that's
special ext4_grp_locked_error() which is used in fs/ext4/mballoc.c.
But that's a special case which requires very careful handling, In
general, you simply must not be in atomic context when you want to
report a problem.