On Sat, Oct 19, 2024 at 2:38 AM Hari Bathini <hbathini@xxxxxxxxxxxxx> wrote: > > From: Naveen N Rao <naveen@xxxxxxxxxx> > > Function profile sequence on powerpc includes two instructions at the > beginning of each function: > mflr r0 > bl ftrace_caller > > The call to ftrace_caller() gets nop'ed out during kernel boot and is > patched in when ftrace is enabled. > > Given the sequence, we cannot return from ftrace_caller with 'blr' as we > need to keep LR and r0 intact. This results in link stack (return > address predictor) imbalance when ftrace is enabled. To address that, we > would like to use a three instruction sequence: > mflr r0 > bl ftrace_caller > mtlr r0 > > Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to > reserve two instruction slots before the function. This results in a > total of five instruction slots to be reserved for ftrace use on each > function that is traced. > > Move the function profile sequence out-of-line to minimize its impact. > To do this, we reserve a single nop at function entry using > -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine > the total number of functions that can be traced. This is then used to > generate a .S file reserving the appropriate amount of space for use as > ftrace stubs, which is built and linked into vmlinux. > > On bootup, the stub space is split into separate stubs per function and > populated with the proper instruction sequence. A pointer to the > associated stub is maintained in dyn_arch_ftrace. > > For modules, space for ftrace stubs is reserved from the generic module > stub space. > > This is restricted to and enabled by default only on 64-bit powerpc, > though there are some changes to accommodate 32-bit powerpc. This is > done so that 32-bit powerpc could choose to opt into this based on > further tests and benchmarks. > > As an example, after this patch, kernel functions will have a single nop > at function entry: > <kernel_clone>: > addis r2,r12,467 > addi r2,r2,-16028 > nop > mfocrf r11,8 > ... > > When ftrace is enabled, the nop is converted to an unconditional branch > to the stub associated with that function: > <kernel_clone>: > addis r2,r12,467 > addi r2,r2,-16028 > b ftrace_ool_stub_text_end+0x11b28 > mfocrf r11,8 > ... > > The associated stub: > <ftrace_ool_stub_text_end+0x11b28>: > mflr r0 > bl ftrace_caller > mtlr r0 > b kernel_clone+0xc > ... > > This change showed an improvement of ~10% in null_syscall benchmark on a > Power 10 system with ftrace enabled. > > Signed-off-by: Naveen N Rao <naveen@xxxxxxxxxx> > Signed-off-by: Hari Bathini <hbathini@xxxxxxxxxxxxx> > --- > diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile > new file mode 100644 > index 000000000000..d2e7ecd5f46f > --- /dev/null > +++ b/arch/powerpc/tools/Makefile > @@ -0,0 +1,9 @@ > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +quiet_cmd_gen_ftrace_ool_stubs = GEN $@ > + cmd_gen_ftrace_ool_stubs = $< "$(CONFIG_64BIT)" "$(OBJDUMP)" vmlinux.o $@ > + > +$(obj)/vmlinux.arch.S: $(src)/ftrace-gen-ool-stubs.sh vmlinux.o FORCE > + $(call if_changed,gen_ftrace_ool_stubs) > + > +targets += vmlinux.arch.S Makefile looks good to me. > diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh > new file mode 100755 > index 000000000000..96e1ca5803e4 > --- /dev/null > +++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh > @@ -0,0 +1,41 @@ > +#!/bin/sh > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +# Error out on error > +set -e > + > +is_64bit="$1" > +objdump="$2" > +vmlinux_o="$3" > +arch_vmlinux_S="$4" > + > +RELOCATION=R_PPC64_ADDR64 > +if [ -z "$is_64bit" ]; then > + RELOCATION=R_PPC_ADDR32 > +fi > + > +num_ool_stubs_text=$($objdump -r -j __patchable_function_entries "$vmlinux_o" | > + grep -v ".init.text" | grep -c "$RELOCATION") > +num_ool_stubs_inittext=$($objdump -r -j __patchable_function_entries "$vmlinux_o" | > + grep ".init.text" | grep -c "$RELOCATION") > + > +cat > "$arch_vmlinux_S" <<EOF > +#include <asm/asm-offsets.h> > +#include <linux/linkage.h> > + > +.pushsection .tramp.ftrace.text,"aw" > +SYM_DATA(ftrace_ool_stub_text_end_count, .long $num_ool_stubs_text) > + > +SYM_CODE_START(ftrace_ool_stub_text_end) > + .space $num_ool_stubs_text * FTRACE_OOL_STUB_SIZE > +SYM_CODE_END(ftrace_ool_stub_text_end) > +.popsection > + > +.pushsection .tramp.ftrace.init,"aw" > +SYM_DATA(ftrace_ool_stub_inittext_count, .long $num_ool_stubs_inittext) > + > +SYM_CODE_START(ftrace_ool_stub_inittext) > + .space $num_ool_stubs_inittext * FTRACE_OOL_STUB_SIZE To avoid the warning mention in another thread, it is better to avoid zero .space. > +SYM_CODE_END(ftrace_ool_stub_inittext) > +.popsection > +EOF > -- > 2.47.0 > -- Best Regards Masahiro Yamada