On 16/01/2019 09:57, Julien Thierry wrote: > Hi Torsten, > > On 04/01/2019 14:10, Torsten Duwe wrote: >> Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning >> of each function. Replace the first NOP thus generated with a quick LR >> saver (move it to scratch reg x9), so the 2nd replacement insn, the call >> to ftrace, does not clobber the value. Ftrace will then generate the >> standard stack frames. >> >> Note that patchable-function-entry in GCC disables IPA-RA, which means >> ABI register calling conventions are obeyed *and* scratch registers >> such as x9 are available. >> >> Introduce and handle an ftrace_regs_trampoline for module PLTs, right >> after ftrace_trampoline, and double the size of this special section. >> >> Signed-off-by: Torsten Duwe <duwe@xxxxxxx> >> > > I wanted to test this patch (and try to benchmark having the "mov x9, > x30" always present in function prelude vs having two nops), but I > cannot get this patch to apply (despite having a version including both > commits below). > > Could you provide a git branch from which I could try to rebase the > patch? (Or a new version of the series) > >> --- >> >> This patch applies on 4.20 with the additional changes >> bdb85cd1d20669dfae813555dddb745ad09323ba >> (arm64/module: switch to ADRP/ADD sequences for PLT entries) >> and >> 7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680 >> (arm64: ftrace: always pass instrumented pc in x0) >> along with their respective series, or alternatively on Linus' master, >> which already has these. >> >> changes since v5: >> >> * fix mentioned pc in x0 to hold the start address of the call site, >> not the return address or the branch address. >> This resolves the problem found by Amit. >> >> --- >> arch/arm64/Kconfig | 2 >> arch/arm64/Makefile | 4 + >> arch/arm64/include/asm/assembler.h | 1 >> arch/arm64/include/asm/ftrace.h | 13 +++ >> arch/arm64/include/asm/module.h | 3 >> arch/arm64/kernel/Makefile | 6 - >> arch/arm64/kernel/entry-ftrace.S | 131 ++++++++++++++++++++++++++++++++++ >> arch/arm64/kernel/ftrace.c | 125 ++++++++++++++++++++++++-------- >> arch/arm64/kernel/module-plts.c | 3 >> arch/arm64/kernel/module.c | 2 >> drivers/firmware/efi/libstub/Makefile | 3 >> include/asm-generic/vmlinux.lds.h | 1 >> include/linux/compiler_types.h | 4 + >> 13 files changed, 262 insertions(+), 36 deletions(-) > > [...] > >> --- a/arch/arm64/kernel/entry-ftrace.S >> +++ b/arch/arm64/kernel/entry-ftrace.S > > [...] > >> @@ -122,6 +124,7 @@ skip_ftrace_call: // } >> ENDPROC(_mcount) >> >> #else /* CONFIG_DYNAMIC_FTRACE */ >> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS >> /* >> * _mcount() is used to build the kernel with -pg option, but all the branch >> * instructions to _mcount() are replaced to NOP initially at kernel start up, >> @@ -159,6 +162,124 @@ GLOBAL(ftrace_graph_call) // ftrace_gra >> >> mcount_exit >> ENDPROC(ftrace_caller) >> +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */ >> + >> +/* >> + * Since no -pg or similar compiler flag is used, there should really be >> + * no reference to _mcount; so do not define one. Only some value for >> + * MCOUNT_ADDR is needed for comparison. Let it point here to have some >> + * sort of magic value that can be recognised when debugging. >> + */ >> +GLOBAL(_mcount) >> + ret /* make it differ from regs caller */ > > There's something I can't figure out. Since there are no callers to > _mcount, how does the ftrace core builds up its record of patchable > functions? > > I don't understand fully the core ftrace code but I've got the > impression that without this record of struct dyn_ftrace, ftrace cannot > patch in calls to tracers in the future. > > Am I missing something? > Forget that second part, I just saw the vmlinux.lds.h change. -- Julien Thierry