Re: [syzbot] [ext4?] BUG: sleeping function called from invalid context in ext4_update_super

"Fabio M. De Francesco" <fmdefrancesco@xxxxxxxxx> · Sun, 11 Jun 2023 21:15:56 +0200

On domenica 11 giugno 2023 15:18:29 CEST Theodore Ts'o wrote:
> On Sun, Jun 11, 2023 at 09:05:31AM +0200, Fabio M. De Francesco wrote:
> > Are you okay with me submitting a patch with your "Suggested by:" tag? Or
> > maybe you prefer to take care of it yourself? For now I await your kind
> > reply.
> Sure, feel free to create such a patch.

Thanks!

Let me summarize, just to be sure we don't misunderstand each other... 

To start off, I'll send out _only_ the patch for the bug reported by Syzbot, 
the one about dropping the call to ext_error() in ext4_get_group_info(). 

I'll do this by Tuesday. (Sorry, I cannot do it by Monday because I must pass 
an exam and an interview for a job).

However, on the other problems with ext4_grp_locked_error() that you noticed 
in the final part of your first message in this thread I'll need some days 
more to better understand the context I'm working in.

> I would strongly recommend that you use gce-xfstests or kvm-xfstests
> before submitting ext4 patches.  In this particular case, it's a
> relatively simple patch, but it's a good habit to get into.  See [1]
> for more details.
> 
> [1] https://thunk.org/gce-xfstests

Thanks also for these information. 

In the last year I have been sending several patches patches for filesystems, 
several little things that comprise conversions (sometimes with additional 
related work, like a series of 5 patches to fs/sysv). 

The only tools I've ever needed were running (x)fstests on a QEMU/KVM x86_32 
VM, 6GB RAM, booting a Kernel with HIGHMEM64G enabled (for PAE).

Never heard about {kvm,gce}-xfstests. I'll have a look at the link and try 
these tests. I can afford $2, but at the moment I don't want to assemble a 
dedicated hardware for tests. 

What's the problem with running those tests in QEMU/KVM VMs??? 

Ah, I just recalled that in December 2022 I sent you three conversion from the 
old kmap{,_atomic} that went reviewed by Jan K. and Ira W.. It was tested with 
fstests as I explained above. That patch got lost, in fact I still see at 
least a kmap_atomic() call site in ext4. One call site is no more in ext4. The 
third today uses kmap_local_folio(). 

Well, everybody dislike to see their patches completely ignored (https://
lore.kernel.org/lkml/20221231174439.8557-1-fmdefrancesco@xxxxxxxxx/). If you 
wanted you could at least apply the part regarding the kmap_atomic() 
conversion in ext4_journalled_write_inline_data(). 

Or I can convert the call to kmap_atomic() to kmap_local_folio(), if you 
wanted. It's up to you, please let me know.

> At the bare minimum, it would be useful for you to run
> "{kvm,gce}-xfstests smoke" or more preferably, "{kvm,gce}-xfstests -c
> ext4/4k -g auto".  If it's a particular complex patch series, running
> "gce-xfstests -c ext4/all -g auto" is nice, since it can save me a lot
> of time when y ou've introduced a subtle corner case bug that doesn't
> show up with lighter testing, and I have to track it down myself.  The
> latter only costs about $2 --- you could do "kvm-xfstests -c ext4/all
> -g auto", but it will take a day or so, and it's not much fun unless
> you have dedicated test hardware.  So if you can afford the $2, I
> strongly recommend using gce-xfstests for the full set of tests.  (The
> smoke test using gce-xfstests costs a penny or two, last time I priced
> it out.  But it's not too bad to run it using kvm-xfstests.)

OK, again thanks for these additional information. I'll surely use all this.

> 
> > Can we "reliably" test !in_atomic() and act accordingly? I remember that 
the
> > in_atomic() helper cannot always detect atomic contexts.
> 
> No; we can do something like BUG_ON(in_atomic(), 

OK, it's the same from the other side of the ways to look at it.

> but it will only work
> if LOCKDEP is enabled, 

I don't know what others do. I always enable LOCKDEP, also in production 
systems.

> and that's generally is not enabled on
> production systems.

Ah, OK.

> On Sun, Jun 11, 2023 at 11:38:07AM +0200, Fabio M. De Francesco wrote:
> > In the meantime, I have had time to think of a different solution that
> > allows
> > the workqueue the chance to run even if the file system is configured to
> > immediately panic on error (sorry, no code - I'm in a hurry)...
> > 
> > This can let you leave that call to ext4_error() that commit 5354b2af3406
> > ("ext4: allow ext4_get_group_info()") had introduced (if it works - please
> > keep on reading).
> > 
> > 1) Init a global variable ("glob") and set it to 0.
> > 2) Modify the code of the error handling workqueue to set "glob" to 1, 
soon
> > before the task is done.
> > 3) Change the fragment that panics the system to call mdelay() in a loop 
> > (it
> > doesn't sleep - correct?) for an adequate amount of time and periodically
> > check READ_ONCE(global) == 1. If true, break and then panic, otherwise
> > reiterate the loop.
> 
> Well, it's more than a bit ugly,

I fully agree. I'm still in search of a reliable way to let atomic context run 
idle waiting for a status change. I am talking about atomic code in paths that 
we don't mind if they loop for few seconds (like the case at hand - we don't 
care to panic immediately or within a sec or two. Do we care???

> and it's not necessarily guaranteed
> to work.

Why not? AFAIK, mdelay() doesn't trigger SAC bugs because it doesn't sleep, 
very differently than msleep(). What's the problem to wait for panicking a 
little later without causing bugs? Any other means to signal in atomic context 
from a workqueue that is done with the assigned work?

> After all we're doing this to allow ext4_error() to be
> called in critical sections.  But that means that while we're doing
> this mdelay loop, we're holding on to a spinlock.  While lockdep isn't
> smart enough to catch this particular deadlock, it's still a real
> potential problem, which means such a solution is also fragile.
> 
> It's better off simply prohibiting ext4_error() to be called while
> holding a lock, and in most places this isn't all that hard. 

I'll do as you asked, but (out of curiosity and especially for the purpose of 
stealing knowledge from very experienced Kernel hackers like you, can you 
please set aside some more minutes to answer all my questions above?

Thanks in advance,

Fabio

> Most of
> the time, you don't want to hold spinlocks across a long period of
> time, because this can become a scalability bottleneck.  This means
> very often the pattern is something like this:
> 
>       spin_lock(..);
>          ...
>       ret = do_stuff();
>       spin_unlock(..);
> 
> So it's enough to check the error return after you've unlocked the
> spinlock.  And you can also just _not_ call ext4_error() inside
> do_stuff(), but have do_stuff return an error code, possibly with a
> call to ext4_warning() if you want to get some context for the problem
> into the kernel log, and then have the top-level function call
> ext4_error().

OK, yours is the right solution. Furthermore I didn't know about the 
importance to show the errors exactly where they currently are. As said, I'll 
follow your suggestion, but I'd like to know the answers at my questions, 
please :-)

> 
> Cheers,
> 
> 						- Ted

Again, thanks for all the time you are devoting to a newcomer like I am.

Fabio