On Sun, Jul 14, 2024 at 5:29 PM Naveen N Rao <naveen@xxxxxxxxxx> wrote: > > Function profile sequence on powerpc includes two instructions at the > beginning of each function: > mflr r0 > bl ftrace_caller > > The call to ftrace_caller() gets nop'ed out during kernel boot and is > patched in when ftrace is enabled. > > Given the sequence, we cannot return from ftrace_caller with 'blr' as we > need to keep LR and r0 intact. This results in link stack (return > address predictor) imbalance when ftrace is enabled. To address that, we > would like to use a three instruction sequence: > mflr r0 > bl ftrace_caller > mtlr r0 > > Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to > reserve two instruction slots before the function. This results in a > total of five instruction slots to be reserved for ftrace use on each > function that is traced. > > Move the function profile sequence out-of-line to minimize its impact. > To do this, we reserve a single nop at function entry using > -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine > the total number of functions that can be traced. This is then used to > generate a .S file reserving the appropriate amount of space for use as > ftrace stubs, which is built and linked into vmlinux. > > On bootup, the stub space is split into separate stubs per function and > populated with the proper instruction sequence. A pointer to the > associated stub is maintained in dyn_arch_ftrace. > > For modules, space for ftrace stubs is reserved from the generic module > stub space. > > This is restricted to and enabled by default only on 64-bit powerpc, > though there are some changes to accommodate 32-bit powerpc. This is > done so that 32-bit powerpc could choose to opt into this based on > further tests and benchmarks. > > As an example, after this patch, kernel functions will have a single nop > at function entry: > <kernel_clone>: > addis r2,r12,467 > addi r2,r2,-16028 > nop > mfocrf r11,8 > ... > > When ftrace is enabled, the nop is converted to an unconditional branch > to the stub associated with that function: > <kernel_clone>: > addis r2,r12,467 > addi r2,r2,-16028 > b ftrace_ool_stub_text_end+0x11b28 > mfocrf r11,8 > ... > > The associated stub: > <ftrace_ool_stub_text_end+0x11b28>: > mflr r0 > bl ftrace_caller > mtlr r0 > b kernel_clone+0xc > ... > > Signed-off-by: Naveen N Rao <naveen@xxxxxxxxxx> > diff --git a/arch/powerpc/tools/Makefile b/arch/powerpc/tools/Makefile > new file mode 100644 > index 000000000000..31dd3151c272 > --- /dev/null > +++ b/arch/powerpc/tools/Makefile > @@ -0,0 +1,10 @@ > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +quiet_cmd_gen_ftrace_ool_stubs = FTRACE $@ This is not "FTRACE". "GEN" or something like that. > + cmd_gen_ftrace_ool_stubs = $< $(objtree)/vmlinux.o $@ > + > +targets += .arch.vmlinux.o > +.arch.vmlinux.o: $(srctree)/arch/powerpc/tools/ftrace-gen-ool-stubs.sh $(objtree)/vmlinux.o FORCE > + $(call if_changed,gen_ftrace_ool_stubs) > + > +clean-files += $(objtree)/.arch.vmlinux.S $(objtree)/.arch.vmlinux.o This is wrong. $(objtree) is always '.' It will attempt to clean up: arch/powerpc/tools/.arch.vmlinux.S arch/powerpc/tools/.arch.vmlinux.o You must not create the intermediate file, .arch.vmlinux.S at the top directory because this build step is pretty much PowerPC-specific. Rather, I'd recommend to create *.S and *.o in arch/powerpc/tools/: arch/powerpc/tools/vmlinux.S arch/powerpc/tools/vmlinux.o When you submit the next version, please run 'make clean' and confirm that any powerpc-specific build artifacts not being left-over. > diff --git a/arch/powerpc/tools/ftrace-gen-ool-stubs.sh b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh > new file mode 100755 > index 000000000000..0b85cd5262ff > --- /dev/null > +++ b/arch/powerpc/tools/ftrace-gen-ool-stubs.sh > @@ -0,0 +1,48 @@ > +#!/bin/sh > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +# Error out on error > +set -e > + > +is_enabled() { > + grep -q "^$1=y" include/config/auto.conf > +} > + > +vmlinux_o=${1} > +arch_vmlinux_o=${2} > +arch_vmlinux_S=$(dirname ${arch_vmlinux_o})/$(basename ${arch_vmlinux_o} .o).S > + > +RELOCATION=R_PPC64_ADDR64 > +if is_enabled CONFIG_PPC32; then > + RELOCATION=R_PPC_ADDR32 > +fi > + > +num_ool_stubs_text=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} | > + grep -v ".init.text" | grep "${RELOCATION}" | wc -l) > +num_ool_stubs_inittext=$(${CROSS_COMPILE}objdump -r -j __patchable_function_entries ${vmlinux_o} | > + grep ".init.text" | grep "${RELOCATION}" | wc -l) > + > +cat > ${arch_vmlinux_S} <<EOF > +#include <asm/asm-offsets.h> > +#include <linux/linkage.h> > + > +.pushsection .tramp.ftrace.text,"aw" > +SYM_DATA(ftrace_ool_stub_text_end_count, .long ${num_ool_stubs_text}) > + > +SYM_CODE_START(ftrace_ool_stub_text_end) > + .space ${num_ool_stubs_text} * FTRACE_OOL_STUB_SIZE > +SYM_CODE_END(ftrace_ool_stub_text_end) > +.popsection > + > +.pushsection .tramp.ftrace.init,"aw" > +SYM_DATA(ftrace_ool_stub_inittext_count, .long ${num_ool_stubs_inittext}) > + > +SYM_CODE_START(ftrace_ool_stub_inittext) > + .space ${num_ool_stubs_inittext} * FTRACE_OOL_STUB_SIZE > +SYM_CODE_END(ftrace_ool_stub_inittext) > +.popsection > +EOF > + > +${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \ > + ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \ > + -c -o ${arch_vmlinux_o} ${arch_vmlinux_S} Please do not compile this within a shell script. scripts/Makefile.build provides rule_as_o_S to do this. [1] vmlinux.o --> arch/powerpc/tools/vmlinux.S [2] arch/powerpc/tools/vmlinux.S --> arch/powerpc/tools/vmlinux.o Please split these in separate build rules. > -- > 2.45.2 > -- Best Regards Masahiro Yamada