I just found this in my Inbox, and this looks to be a legit issue. On Thu, 26 Jul 2018 12:40:29 +0200 Nicolai Stange <nstange@xxxxxxx> wrote: Nicolai, You still working on this? > With dynamic ftrace, ftrace patches call sites in a three steps: > 1. Put a breakpoint at the to be patched location, > 2. update call site and > 3. finally remove the breakpoint again. > > Note that the breakpoint ftrace_int3_handler() simply makes execution > to skip over the to be patched function. > > This patching happens in the following circumstances: > - the global ftrace_trace_function changes and the call sites at > ftrace_call and ftrace_regs_call get patched, > - an ftrace_ops' ->func changes and the call site in its trampoline gets > patched, > - graph tracing gets enabled/disabled and the jump site at > ftrace_graph_call gets patched > - a mcount site gets converted from nop -> call, call -> nop > or call -> call. > > The latter case, i.e. a mcount call getting redirected, happens for example > in a transition from trampolined to not trampolined upon a user enabling > function tracing on a live patched function. > > The ftrace_int3_handler() simply skipping over the mcount callsite is > problematic here, because it means that a live patched function gets > temporarily reverted to its unpatched original and this breaks live > patching consistency. > > Make ftrace_int3_handler not to skip over the fops invocation, but modify > the interrupted control flow to issue a call as needed. > > For determining what the proper action actually is, modify > update_ftrace_func() to take an extra argument, func, to be called if the > corresponding breakpoint gets hit. Introduce a new global variable, > trace_update_func_dest and let update_ftrace_func() set it. For the special > case of patching the jmp insn at ftrace_graph_call, set it to zero and make > ftrace_int3_handler() just skip over the insn in this case as before. > > Because there's no space left above the int3 handler's iret frame to stash > an extra call's return address in, this can't be mimicked from the > ftrace_int3_handler() itself. > > Instead, make ftrace_int3_handler() change the iret frame's %rip slot to > point to the new ftrace_int3_call_trampoline to be executed immediately > after iret. > > The original %rip gets communicated to ftrace_int3_call_trampoline which > can then take proper action. Note that ftrace_int3_call_trampoline can > nest, because of NMIs, for example. In order to avoid introducing another > stack, abuse %r11 for passing the original %rip. This is safe, because the > interrupted context is always at a call insn and according to the x86_64 > ELF ABI allows %r11 is callee-clobbered. According to the glibc sources, > this is also true for the somewhat special mcount/fentry ABI. > > OTOH, a spare register doesn't exist on i686 and thus, this is approach is > limited to x86_64. > > For determining the call target address, let ftrace_int3_call_trampoline > compare ftrace_update_func against the interrupted %rip. I don't think we need to do the compare. > > If they match, an update_ftrace_func() is currently pending and the > call site is either of ftrace_call, ftrace_regs_call or the call insn > within a fops' trampoline. Jump to the ftrace_update_func_dest location as > set by update_ftrace_func(). > > If OTOH the interrupted %rip doesn't equal ftrace_update_func, then > it points to an mcount call site. In this case, redirect control flow to > the most generic handler, ftrace_regs_caller, which will then do the right > thing. > > Finally, reading ftrace_update_func and ftrace_update_func_dest from > outside of the int3 handler requires some care: inbetween adding and > removing the breakpoints, ftrace invokes run_sync() which basically > issues a couple of IPIs. Because the int3 handler has interrupts disabled, > it orders with run_sync(). > > To extend that ordering to also include ftrace_int3_call_trampoline, make > ftrace_int3_handler() disable interrupts within the iret frame returning to > it. > > Store the original EFLAGS.IF into %r11's MSB which, because it represents > a kernel address, is redundant. Make ftrace_int3_call_trampoline restore > it when done. This can be made much simpler by making a hardcoded ftrace_int3_tramp that does the following: ftrace_int3_tramp push %r11 jmp ftrace_caller The ftrace_caller will either call a single ftrace callback, if there's only a single ops registered, or it will call the loop function that iterates over all the ftrace_ops registered and will call each function that matches the ftrace_ops hashes. In either case, it's what we want. The ftrace_int3_tramp will simply simulate the call ftrace_caller that would be there as the default caller if more than one function is set to it. For 32 bit, we could add 4 variables on the thread_info and make 4 trampolines, one for each context (normal, softirq, irq and NMI), and have them use the variable stored in the thread_info for that location: ftrace_int3_tramp_normal push %eax # just for space push %eax mov whatever_to_get_thread_info, %eax mov normal(%eax), %eax mov %eax, 4(%esp) pop %eax jmp ftrace_caller ftrace_int3_tramp_softiqr push %eax # just for space push %eax mov whatever_to_get_thread_info, %eax mov softirq(%eax), %eax mov %eax, 4(%esp) pop %eax jmp ftrace_caller etc.. A bit crazier for 32 bit but it can still be done. ;-) -- Steve > > Signed-off-by: Nicolai Stange <nstange@xxxxxxx> > --- > arch/x86/kernel/ftrace.c | 48 ++++++++++++++++++++++++++++++++------ > arch/x86/kernel/ftrace_64.S | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 97 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c > index 01ebcb6f263e..3908a73370a8 100644 > --- a/arch/x86/kernel/ftrace.c > +++ b/arch/x86/kernel/ftrace.c > @@ -227,9 +227,11 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, > return -EINVAL; > } > > -static unsigned long ftrace_update_func; > +unsigned long ftrace_update_func; > +unsigned long ftrace_update_func_dest; > > -static int update_ftrace_func(unsigned long ip, void *new) > +static int update_ftrace_func(unsigned long ip, void *new, > + unsigned long func) > { > unsigned char old[MCOUNT_INSN_SIZE]; > int ret; > @@ -237,6 +239,8 @@ static int update_ftrace_func(unsigned long ip, void *new) > memcpy(old, (void *)ip, MCOUNT_INSN_SIZE); > > ftrace_update_func = ip; > + ftrace_update_func_dest = func; > + > /* Make sure the breakpoints see the ftrace_update_func update */ > smp_wmb(); > > @@ -257,13 +261,13 @@ int ftrace_update_ftrace_func(ftrace_func_t func) > int ret; > > new = ftrace_call_replace(ip, (unsigned long)func); > - ret = update_ftrace_func(ip, new); > + ret = update_ftrace_func(ip, new, (unsigned long)func); > > /* Also update the regs callback function */ > if (!ret) { > ip = (unsigned long)(&ftrace_regs_call); > new = ftrace_call_replace(ip, (unsigned long)func); > - ret = update_ftrace_func(ip, new); > + ret = update_ftrace_func(ip, new, (unsigned long)func); > } > > return ret; > @@ -277,6 +281,10 @@ static int is_ftrace_caller(unsigned long ip) > return 0; > } > > +#if IS_ENABLED(CONFIG_X86_64) > +extern char ftrace_int3_call_trampoline[]; > +#endif > + > /* > * A breakpoint was added to the code address we are about to > * modify, and this is the handle that will just skip over it. > @@ -287,6 +295,7 @@ static int is_ftrace_caller(unsigned long ip) > int ftrace_int3_handler(struct pt_regs *regs) > { > unsigned long ip; > + unsigned long __maybe_unused eflags_if; > > if (WARN_ON_ONCE(!regs)) > return 0; > @@ -295,8 +304,33 @@ int ftrace_int3_handler(struct pt_regs *regs) > if (!ftrace_location(ip) && !is_ftrace_caller(ip)) > return 0; > > - regs->ip += MCOUNT_INSN_SIZE - 1; > +#if IS_ENABLED(CONFIG_X86_64) > + if (is_ftrace_caller(ip) && !ftrace_update_func_dest) { > + /* > + * There's an update on ftrace_graph_call pending. > + * Just skip over it. > + */ > + regs->ip += MCOUNT_INSN_SIZE - 1; > + return 1; > + } > > + /* > + * Return to ftrace_int3_call_trampoline with interrupts > + * disabled in order to block ftrace's run_sync() IPIs > + * and keep ftrace_update_func_dest valid. > + */ > + eflags_if = regs->flags & X86_EFLAGS_IF; > + regs->flags ^= eflags_if; > + /* > + * The MSB of ip, which is a kernel address, is always one. > + * Abuse it to store EFLAGS.IF there. > + */ > + ip &= eflags_if << (BITS_PER_LONG - 1 - X86_EFLAGS_IF_BIT); > + regs->r11 = ip; > + regs->ip = (unsigned long)ftrace_int3_call_trampoline; > +#else /* !IS_ENABLED(CONFIG_X86_64) */ > + regs->ip += MCOUNT_INSN_SIZE - 1; > +#endif > return 1; > } > > @@ -870,7 +904,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops) > > /* Do a safe modify in case the trampoline is executing */ > new = ftrace_call_replace(ip, (unsigned long)func); > - ret = update_ftrace_func(ip, new); > + ret = update_ftrace_func(ip, new, (unsigned long)func); > set_memory_ro(ops->trampoline, npages); > > /* The update should never fail */ > @@ -966,7 +1000,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func) > > new = ftrace_jmp_replace(ip, (unsigned long)func); > > - return update_ftrace_func(ip, new); > + return update_ftrace_func(ip, new, 0); > } > > int ftrace_enable_ftrace_graph_caller(void) > diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S > index 91b2cff4b79a..b51d4fb9af79 100644 > --- a/arch/x86/kernel/ftrace_64.S > +++ b/arch/x86/kernel/ftrace_64.S > @@ -9,6 +9,7 @@ > #include <asm/export.h> > #include <asm/nospec-branch.h> > #include <asm/unwind_hints.h> > +#include <asm/irqflags.h> > > .code64 > .section .entry.text, "ax" > @@ -262,6 +263,61 @@ GLOBAL(ftrace_regs_caller_end) > ENDPROC(ftrace_regs_caller) > > > +ENTRY(ftrace_int3_call_trampoline) > + /* > + * A breakpoint was hit on what either had been or will become > + * a call insn. Mimic the function call here: push the desired > + * return address on the stack and jmp to the target function. > + * The inverted value of EFLAGS.IF is stored in %r11's high bit. > + * The remaining bits of %r11 store the break point address > + * (whose MSB is known to always be set, because it's a kernel > + * address). > + */ > + UNWIND_HINT_EMPTY > + subq $8, %rsp > + pushq %rax > + movq $1, %rax > + shlq $63, %rax > + orq %r11, %rax /* ip is in %rax now */ > + > + /* Prepare the on-stack return address. */ > + pushq %rax > + addq $5, %rax /* ip + MCOUNT_INSN_SIZE */ > + movq %rax, 16(%rsp) > + popq %rax > + > + /* > + * Determine where to redirect control flow to. There are > + * two cases: > + * 1.) The breakpoint is at either of ftrace_call, > + * ftrace_regs_call or the callsite within a fops' trampoline. > + * 2.) The breakpoint is at an mcount callsite. > + * > + * For the first case, update_ftrace_func has setup > + * ftrace_update_func_dest to the target address. > + * For the second case, just branch to the most generic > + * ftrace_regs_caller which will then do the right thing. > + */ > + cmpq %rax, ftrace_update_func(%rip) > + jne 1f > + movq ftrace_update_func_dest(%rip), %rax > + jmp 2f > +1: > + leaq ftrace_regs_caller(%rip), %rax > +2: > + /* Restore EFLAGS.IF */ > + btq $63, %r11 > + jnc 3f > + sti > + TRACE_IRQS_ON > +3: > + mov %rax, %r11 > + popq %rax > + > + JMP_NOSPEC %r11 > +END(ftrace_int3_call_trampoline) > + > + > #else /* ! CONFIG_DYNAMIC_FTRACE */ > > ENTRY(function_hook)