On Sat, Nov 19, 2022 at 12:00:49PM +0800, Zheng Yejian wrote: > Register kprobe on __rcu_irq_enter_check_tick() can cause kernel stack > overflow [1]. This issue is first found in v5.10 and can be reproduced > by enabling CONFIG_NO_HZ_FULL and doing like: > # cd /sys/kernel/debug/tracing/ > # echo 'p:mp1 __rcu_irq_enter_check_tick' >> kprobe_events > # echo 1 > events/kprobes/enable > > So __rcu_irq_enter_check_tick() should not be kprobed, mark it as noinstr. Good catch! I am inclined to queue this, but noticed that one of its callers need it to be noinstr but that the others do not. Need noinstr: o enter_from_kernel_mode() -> __enter_from_kernel_mode() -> rcu_irq_enter_check_tick() -> __rcu_irq_enter_check_tick() Doesn't need noinstr: o ct_nmi_enter() -> rcu_irq_enter_check_tick() -> __rcu_irq_enter_check_tick(), courtesy of the call to instrumentation_begin() in ct_nmi_enter() that precedes the call to rcu_irq_enter_check_tick(). o irqentry_enter() -> rcu_irq_enter_check_tick() -> __rcu_irq_enter_check_tick(), courtesy of the call to instrumentation_begin() in irqentry_enter() that precedes the call to rcu_irq_enter_check_tick(). Is tagging __rcu_irq_enter_check_tick() with noinstr as proposed in this patch the right thing to do, or should there be calls to instrumentation_begin() and instrumentation_end() in enter_from_kernel_mode()? Or something else entirely? Thoughts? Thanx, Paul > [1] > Insufficient stack space to handle exception! > Insufficient stack space to handle exception! > [...] > Kernel panic - not syncing: kernel stack overflow > CPU: 3 PID: 34 Comm: migration/3 Not tainted > 6.1.0-rc5-00884-g84368d882b96 #2 > Hardware name: linux,dummy-virt (DT) > Stopper: multi_cpu_stop+0x0/0x228 <- __stop_cpus.constprop.0+0xa4/0x100 > Call trace: > dump_backtrace+0xf8/0x108 > show_stack+0x20/0x48 > dump_stack_lvl+0x68/0x84 > dump_stack+0x1c/0x38 > panic+0x214/0x404 > add_taint+0x0/0xf0 > panic_bad_stack+0x144/0x160 > handle_bad_stack+0x38/0x58 > __bad_stack+0x78/0x7c > el1h_64_sync_handler+0x4/0xe8 > __rcu_irq_enter_check_tick+0x0/0x1b8 > arm64_enter_el1_dbg.isra.0+0x14/0x20 > el1_dbg+0x2c/0x90 > el1h_64_sync_handler+0xcc/0xe8 > el1h_64_sync+0x64/0x68 > __rcu_irq_enter_check_tick+0x0/0x1b8 > arm64_enter_el1_dbg.isra.0+0x14/0x20 > el1_dbg+0x2c/0x90 > el1h_64_sync_handler+0xcc/0xe8 > el1h_64_sync+0x64/0x68 > __rcu_irq_enter_check_tick+0x0/0x1b8 > arm64_enter_el1_dbg.isra.0+0x14/0x20 > el1_dbg+0x2c/0x90 > el1h_64_sync_handler+0xcc/0xe8 > el1h_64_sync+0x64/0x68 > __rcu_irq_enter_check_tick+0x0/0x1b8 > arm64_enter_el1_dbg.isra.0+0x14/0x20 > el1_dbg+0x2c/0x90 > el1h_64_sync_handler+0xcc/0xe8 > el1h_64_sync+0x64/0x68 > __rcu_irq_enter_check_tick+0x0/0x1b8 > arm64_enter_el1_dbg.isra.0+0x14/0x20 > el1_dbg+0x2c/0x90 > el1h_64_sync_handler+0xcc/0xe8 > el1h_64_sync+0x64/0x68 > __rcu_irq_enter_check_tick+0x0/0x1b8 > [...] > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: aaf2bc50df1f ("rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()") > Signed-off-by: Zheng Yejian <zhengyejian1@xxxxxxxxxx> > --- > kernel/rcu/tree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 93416afebd59..68230f02cfb7 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -631,7 +631,7 @@ void rcu_irq_exit_check_preempt(void) > * controlled environments, this function allows RCU to get what it > * needs without creating otherwise useless interruptions. > */ > -void __rcu_irq_enter_check_tick(void) > +noinstr void __rcu_irq_enter_check_tick(void) > { > struct rcu_data *rdp = this_cpu_ptr(&rcu_data); > > -- > 2.25.1 >