On Fri, Nov 29, 2024 at 05:35:54PM +0900, Masami Hiramatsu wrote: > On Sat, 23 Nov 2024 03:39:45 +0000 > Ruan Bonan <bonan.ruan@xxxxxxxxx> wrote: > > > > > vprintk_emit+0x414/0xb90 kernel/printk/printk.c:2406 > > _printk+0x7a/0xa0 kernel/printk/printk.c:2432 > > fail_dump lib/fault-inject.c:46 [inline] > > should_fail_ex+0x3be/0x570 lib/fault-inject.c:154 > > strncpy_from_user+0x36/0x230 lib/strncpy_from_user.c:118 > > strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186 > > bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:215 [inline] > > ____bpf_probe_read_user_str kernel/trace/bpf_trace.c:224 [inline] > > Hmm, this is a combination issue of BPF and fault injection. > > static void fail_dump(struct fault_attr *attr) > { > if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) { > printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n" > "name %pd, interval %lu, probability %lu, " > "space %d, times %d\n", attr->dname, > attr->interval, attr->probability, > atomic_read(&attr->space), > atomic_read(&attr->times)); > > This printk() acquires console lock under rq->lock has been acquired. > > This can happen if we use fault injection and trace event too because > the fault injection caused printk warning. Ah indeed. Same difference though, if you don't know the context, most things are unsafe to do. > I think this should be a bug of the fault injection, not tracing/BPF. > And to solve this issue, we may be able to check the context and if > it is tracing/NMI etc, fault injection should NOT make it failure. Well, it should be okay to cause the failure, but it must be very careful how it goes about doing that. Tripping printk() definitely is out. But there's a much bigger problem there, get_random*() is not wait-free, in fact it takes a spinlock_t which makes that it is unusable from most context, and it's definitely out for tracing. Notably, this spinlock_t makes that it is unsafe to use from anything that holds a raw_spinlock_t or is from hardirq context, or has preempt_disable() -- which is a TON of code. On this alone I would currently label the whole of fault-injection broken. The should_fail() call itself is unsafe where many of its callsites are otherwise perfectly fine -- eg. usercopy per the above. Perhaps it should use a simple PRNG, a simple LFSR should be plenty good enough to provide failure conditions. And yeah, I would just completely rip out the printk. Trying to figure out where and when it's safe to call printk() is non-trivial and just not worth the effort imo.