2024年11月29日(金) 21:09 Peter Zijlstra <peterz@xxxxxxxxxxxxx>: > > On Fri, Nov 29, 2024 at 05:35:54PM +0900, Masami Hiramatsu wrote: > > On Sat, 23 Nov 2024 03:39:45 +0000 > > Ruan Bonan <bonan.ruan@xxxxxxxxx> wrote: > > > > > > > > vprintk_emit+0x414/0xb90 kernel/printk/printk.c:2406 > > > _printk+0x7a/0xa0 kernel/printk/printk.c:2432 > > > fail_dump lib/fault-inject.c:46 [inline] > > > should_fail_ex+0x3be/0x570 lib/fault-inject.c:154 > > > strncpy_from_user+0x36/0x230 lib/strncpy_from_user.c:118 > > > strncpy_from_user_nofault+0x71/0x140 mm/maccess.c:186 > > > bpf_probe_read_user_str_common kernel/trace/bpf_trace.c:215 [inline] > > > ____bpf_probe_read_user_str kernel/trace/bpf_trace.c:224 [inline] > > > > Hmm, this is a combination issue of BPF and fault injection. > > > > static void fail_dump(struct fault_attr *attr) > > { > > if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) { > > printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n" > > "name %pd, interval %lu, probability %lu, " > > "space %d, times %d\n", attr->dname, > > attr->interval, attr->probability, > > atomic_read(&attr->space), > > atomic_read(&attr->times)); > > > > This printk() acquires console lock under rq->lock has been acquired. > > > > This can happen if we use fault injection and trace event too because > > the fault injection caused printk warning. > > Ah indeed. Same difference though, if you don't know the context, most > things are unsafe to do. > > > I think this should be a bug of the fault injection, not tracing/BPF. > > And to solve this issue, we may be able to check the context and if > > it is tracing/NMI etc, fault injection should NOT make it failure. > > Well, it should be okay to cause the failure, but it must be very > careful how it goes about doing that. Tripping printk() definitely is > out. > > But there's a much bigger problem there, get_random*() is not wait-free, > in fact it takes a spinlock_t which makes that it is unusable from most > context, and it's definitely out for tracing. > > Notably, this spinlock_t makes that it is unsafe to use from anything > that holds a raw_spinlock_t or is from hardirq context, or has > preempt_disable() -- which is a TON of code. > > On this alone I would currently label the whole of fault-injection > broken. The should_fail() call itself is unsafe where many of its > callsites are otherwise perfectly fine -- eg. usercopy per the above. > > Perhaps it should use a simple PRNG, a simple LFSR should be plenty good > enough to provide failure conditions. Sounds good. > And yeah, I would just completely rip out the printk. Trying to figure > out where and when it's safe to call printk() is non-trivial and just > not worth the effort imo. Instead of removing the printk completely, How about setting the default value of the verbose option to zero so it doesn't call printk and gives a loud warning when changing the verbose option?