On Tue, Jan 08, 2019 at 01:43:55PM +0900, Masami Hiramatsu wrote: > Hello, > > This is v2 series of fixing kretprobe incorrect stacking order patches. > In this version, I fixed a lack of kprobes.h including and added new > patch for kretprobe trampoline recursion issue. (and add Cc:stable) > > (1) kprobe incorrct stacking order problem > > On recent talk with Andrea, I started more precise investigation on > the kernel panic with kretprobes on notrace functions, which Francis > had been reported last year ( https://lkml.org/lkml/2017/7/14/466 ). > > See the investigation details in > https://lkml.kernel.org/r/154686789378.15479.2886543882215785247.stgit@devbox > > When we put a kretprobe on ftrace_ops_assist_func() and put another > kretprobe on probed-function, below happens > > <caller> > -><probed-function> > ->fentry > ->ftrace_ops_assist_func() > ->int3 > ->kprobe_int3_handler() > ...->pre_handler_kretprobe() > push the return address (*fentry*) of ftrace_ops_assist_func() to > top of the kretprobe list and replace it with kretprobe_trampoline. > <-kprobe_int3_handler() > <-(int3) > ->kprobe_ftrace_handler() > ...->pre_handler_kretprobe() > push the return address (caller) of probed-function to top of the > kretprobe list and replace it with kretprobe_trampoline. > <-(kprobe_ftrace_handler()) > <-(ftrace_ops_assist_func()) > [kretprobe_trampoline] > ->tampoline_handler() > pop the return address (caller) from top of the kretprobe list > <-(trampoline_handler()) > <caller> > [run caller with incorrect stack information] > <-(<caller>) > !!KERNEL PANIC!! > > Therefore, this kernel panic happens only when we put 2 k*ret*probes on > ftrace_ops_assist_func() and other functions. If we put kprobes, it > doesn't cause any issue, since it doesn't change the return address. > > To fix (or just avoid) this issue, we can introduce a frame pointer > verification to skip wrong order entries. And I also would like to > blacklist those functions because those are part of ftrace-based > kprobe handling routine. > > (2) kretprobe trampoline recursion problem > > This was found by Andrea in the previous thread > https://lkml.kernel.org/r/20190107183444.GA5966@xps-13 > > ---- > echo "r:event_1 __fdget" >> kprobe_events > echo "r:event_2 _raw_spin_lock_irqsave" >> kprobe_events > echo 1 > events/kprobes/enable > [DEADLOCK] > ---- > > Because kretprobe trampoline_handler uses spinlock for protecting > hash table, if we probe the spinlock itself, it causes deadlock. > Thank you Andrea and Steve for discovering this root cause!! > > This bug has been introduced with the asm-coded trampoline > code, since previously it used another kprobe for hooking > the function return placeholder (which only has a nop) and > trampoline handler was called from that kprobe. > > To fix this bug, I introduced a dummy kprobe and set it in > current_kprobe as we did in old days. > > Thank you, It looks all good to me, with this patch set I couldn't break the kernel in any way. Tested-by: Andrea Righi <righi.andrea@xxxxxxxxx> Thanks, -Andrea