Re: [BUG] possible deadlock in __schedule (with reproducer available)

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Fri, 29 Nov 2024 13:09:39 +0100

On Fri, Nov 29, 2024 at 05:35:54PM +0900, Masami Hiramatsu wrote:
> On Sat, 23 Nov 2024 03:39:45 +0000
> Ruan Bonan <bonan.ruan@xxxxxxxxx> wrote:
> 
> > 
> >        vprintk_emit+0x414/0xb90 kernel/printk/printk.c:2406
> >        _printk+0x7a/0xa0 kernel/printk/printk.c:2432
> >        fail_dump lib/fault-inject.c:46 [inline]
> >        should_fail_ex+0x3be/0x570 lib/fault-inject.c:154
> >        strncpy_from_user+0x36/0x230 lib/strncpy_from_user.c:118
> >        strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186
> >        bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:215 [inline]
> >        ____bpf_probe_read_user_str kernel/trace/bpf_trace.c:224 [inline]
> 
> Hmm, this is a combination issue of BPF and fault injection.
> 
> static void fail_dump(struct fault_attr *attr)
> {
>         if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) {
>                 printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n"
>                        "name %pd, interval %lu, probability %lu, "
>                        "space %d, times %d\n", attr->dname,
>                        attr->interval, attr->probability,
>                        atomic_read(&attr->space),
>                        atomic_read(&attr->times));
> 
> This printk() acquires console lock under rq->lock has been acquired.
> 
> This can happen if we use fault injection and trace event too because
> the fault injection caused printk warning.

Ah indeed. Same difference though, if you don't know the context, most
things are unsafe to do.

> I think this should be a bug of the fault injection, not tracing/BPF.
> And to solve this issue, we may be able to check the context and if
> it is tracing/NMI etc, fault injection should NOT make it failure.

Well, it should be okay to cause the failure, but it must be very
careful how it goes about doing that. Tripping printk() definitely is
out.

But there's a much bigger problem there, get_random*() is not wait-free,
in fact it takes a spinlock_t which makes that it is unusable from most
context, and it's definitely out for tracing.

Notably, this spinlock_t makes that it is unsafe to use from anything
that holds a raw_spinlock_t or is from hardirq context, or has
preempt_disable() -- which is a TON of code.

On this alone I would currently label the whole of fault-injection
broken. The should_fail() call itself is unsafe where many of its
callsites are otherwise perfectly fine -- eg. usercopy per the above.

Perhaps it should use a simple PRNG, a simple LFSR should be plenty good
enough to provide failure conditions.

And yeah, I would just completely rip out the printk. Trying to figure
out where and when it's safe to call printk() is non-trivial and just
not worth the effort imo.