On Thu, Feb 2, 2023 at 8:03 PM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > On Wed, Feb 01, 2023 at 05:34:18PM +0100, Florent Revest wrote: > > From: Xu Kuohai <xukuohai@xxxxxxxxxx> > > > > After direct call is enabled for arm64, ftrace selftest enters a > > dead loop: > > > > <trace_selftest_dynamic_test_func>: > > 00 bti c > > 01 mov x9, x30 <trace_direct_tramp>: > > 02 bl <trace_direct_tramp> ----------> ret > > | > > lr/x30 is 03, return to 03 > > | > > 03 mov w0, #0x0 <-----------------------------| > > | | > > | dead loop! | > > | | > > 04 ret ---- lr/x30 is still 03, go back to 03 ----| > > > > The reason is that when the direct caller trace_direct_tramp() returns > > to the patched function trace_selftest_dynamic_test_func(), lr is still > > the address after the instrumented instruction in the patched function, > > so when the patched function exits, it returns to itself! > > > > To fix this issue, we need to restore lr before trace_direct_tramp() > > exits, so use a dedicated trace_direct_tramp() for arm64. > > > > Reported-by: Li Huafei <lihuafei1@xxxxxxxxxx> > > Signed-off-by: Xu Kuohai <xukuohai@xxxxxxxxxx> > > Acked-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx> > > Signed-off-by: Florent Revest <revest@xxxxxxxxxxxx> > > Looking at this, I don't think that the existing trace_direct_tramp makes > sense -- in general a C function doesn't follow the direct call trampoline > calling convention, and cannot be used as a direct call trampoline. Agreed. > Looking further, there's a distinct latent issue on s390, where the mismatch > between their regular calling convention and their direct call trampoline > calling convention means that trace_direct_tramp() returns into the *caller* of > the instrumented function, skipping that entirely (verified locally with QEMU > and printk()s added to DYN_FTRACE_TEST_NAME() / DYN_FTRACE_TEST_NAME2() > functions). Good catch! I didn't dare to adventure that far into s390 :) > I think it'd be much better to do something like the below as a preparatory > cleanup (tested on s390 under QEMU). Thanks, that looks great to me. I'll make it a part of the series in v2 then. Unless it's preferred that this gets merged separately? > Thanks, > Mark. > > ---->8---- > From 3b3abc89fe014ea49282622c4312521b710d1463 Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@xxxxxxx> > Date: Thu, 2 Feb 2023 18:37:40 +0000 > Subject: [PATCH] ftrace: selftest: remove broken trace_direct_tramp > > The ftrace selftest code has a trace_direct_tramp() function which it > uses as a direct call trampoline. This happens to work on x86, since the > direct call's return address is in the usual place, and can be returned > to via a RET, but in general the calling convention for direct calls is > different from regular function calls, and requires a trampoline written > in assembly. > > On s390, regular function calls place the return address in %r14, and an > ftrace patch-site in an instrumented function places the trampoline's > return address (which is within the instrumented function) in %r0, > preserving the original %r14 value in-place. As a regular C function > will return to the address in %r14, using a C function as the trampoline > results in the trampoline returning to the caller of the instrumented > function, skipping the body of the instrumented function. > > Note that the s390 issue is not detcted by the ftrace selftest code, as > the instrumented function is trivial, and returning back into the caller > happens to be equivalent. > > On arm64, regular function calls place the return address in x30, and > an ftrace patch-site in an instrumented function saves this into r9 > and places the trampoline's return address (within the instrumented > function) in x30. A regular C function will return to the address in > x30, but will not restore x9 into x30. Consequently, using a C function > as the trampoline results in returning to the trampoline's return > address having corrupted x30, such that when the instrumented function > returns, it will return back into itself. > > To avoid future issues in this area, remove the trace_direct_tramp() > function, and require that each architecture with direct calls provides > a stub trampoline, named ftrace_stub_direct_tramp. This can be written > to handle the architecture's trampoline calling convention, and in > future could be used elsewhere (e.g. in the ftrace ops sample, to > measure the overhead of direct calls), so we may as well always build it > in. > > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Li Huafei <lihuafei1@xxxxxxxxxx> > Cc: Xu Kuohai <xukuohai@xxxxxxxxxx> > Cc: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx> > Cc: Florent Revest <revest@xxxxxxxxxxxx> > --- > arch/s390/kernel/mcount.S | 5 +++++ > arch/x86/kernel/ftrace_32.S | 5 +++++ > arch/x86/kernel/ftrace_64.S | 4 ++++ > include/linux/ftrace.h | 2 ++ > kernel/trace/trace_selftest.c | 15 ++------------- > 5 files changed, 18 insertions(+), 13 deletions(-) > > diff --git a/arch/s390/kernel/mcount.S b/arch/s390/kernel/mcount.S > index 4786bfe02144..ad13a0e2c307 100644 > --- a/arch/s390/kernel/mcount.S > +++ b/arch/s390/kernel/mcount.S > @@ -32,6 +32,11 @@ ENTRY(ftrace_stub) > BR_EX %r14 > ENDPROC(ftrace_stub) > > +SYM_CODE_START(ftrace_stub_direct_tramp) > + lgr %r1, %r0 > + BR_EX %r1 > +SYM_CODE_END(ftrace_stub_direct_tramp) > + > .macro ftrace_regs_entry, allregs=0 > stg %r14,(__SF_GPRS+8*8)(%r15) # save traced function caller > > diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S > index a0ed0e4a2c0c..0d9a14528176 100644 > --- a/arch/x86/kernel/ftrace_32.S > +++ b/arch/x86/kernel/ftrace_32.S > @@ -163,6 +163,11 @@ SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL) > jmp .Lftrace_ret > SYM_CODE_END(ftrace_regs_caller) > > +SYM_FUNC_START(ftrace_stub_direct_tramp) > + CALL_DEPTH_ACCOUNT > + RET > +SYM_FUNC_END(ftrace_stub_direct_tramp) > + > #ifdef CONFIG_FUNCTION_GRAPH_TRACER > SYM_CODE_START(ftrace_graph_caller) > pushl %eax > diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S > index 1265ad519249..8fc77e3e039c 100644 > --- a/arch/x86/kernel/ftrace_64.S > +++ b/arch/x86/kernel/ftrace_64.S > @@ -307,6 +307,10 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL) > SYM_FUNC_END(ftrace_regs_caller) > STACK_FRAME_NON_STANDARD_FP(ftrace_regs_caller) > > +SYM_FUNC_START(ftrace_stub_direct_tramp) > + CALL_DEPTH_ACCOUNT > + RET > +SYM_FUNC_END(ftrace_stub_direct_tramp) > > #else /* ! CONFIG_DYNAMIC_FTRACE */ > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index 3d2156e335d7..a9836b40630e 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -412,6 +412,8 @@ int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr); > int modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr); > int modify_ftrace_direct_nolock(struct ftrace_ops *ops, unsigned long addr); > > +void ftrace_stub_direct_tramp(void *); > + > #else > struct ftrace_ops; > # define ftrace_direct_func_count 0 > diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c > index 06218fc9374b..e6530b7b42e4 100644 > --- a/kernel/trace/trace_selftest.c > +++ b/kernel/trace/trace_selftest.c > @@ -784,17 +784,6 @@ static struct fgraph_ops fgraph_ops __initdata = { > .retfunc = &trace_graph_return, > }; > > -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS > -#ifndef CALL_DEPTH_ACCOUNT > -#define CALL_DEPTH_ACCOUNT "" > -#endif > - > -noinline __noclone static void trace_direct_tramp(void) > -{ > - asm(CALL_DEPTH_ACCOUNT); > -} > -#endif > - > /* > * Pretty much the same than for the function tracer from which the selftest > * has been borrowed. > @@ -875,7 +864,7 @@ trace_selftest_startup_function_graph(struct tracer *trace, > */ > ftrace_set_filter_ip(&direct, (unsigned long)DYN_FTRACE_TEST_NAME, 0, 0); > ret = register_ftrace_direct(&direct, > - (unsigned long)trace_direct_tramp); > + (unsigned long)ftrace_stub_direct_tramp); > if (ret) > goto out; > > @@ -896,7 +885,7 @@ trace_selftest_startup_function_graph(struct tracer *trace, > unregister_ftrace_graph(&fgraph_ops); > > ret = unregister_ftrace_direct(&direct, > - (unsigned long)trace_direct_tramp); > + (unsigned long)ftrace_stub_direct_tramp); > if (ret) > goto out; > > -- > 2.30.2 >