On Thu, Feb 27, 2014 at 12:00:14PM -0500, Steven Rostedt wrote: > On Thu, 27 Feb 2014 17:37:32 +0100 > Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote: > > > On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote: > > > [Request for Ack] > > > > > > From: Petr Mladek <pmladek@xxxxxxx> > > > > > > If a failure occurs while modifying ftrace function, it bails out and will > > > remove the tracepoints to be back to what the code originally was. > > > > > > There is missing the final sync run across the CPUs after the fix up is done > > > and before the ftrace int3 handler flag is reset. > > > > So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps if we don't sync the > > other cores after reverting the int3 before decrementing the modifying_ftrace_code counter? > > Actually, the bug is that they will not ignore the ftrace traps after > we decrement modifying_ftrace_code counter. Here's the race: > > CPU0 CPU1 > ---- ---- > remove_breakpoint(); > modifying_ftrace_code = 0; > > [still sees breakpoint] > <takes trap> > [sees modifying_ftrace_code as zero] > [no breakpoint handler] > [goto failed case] > [trap exception - kernel breakpoint, no > handler] > BUG() > > > Even if we had a smp_wmb() after removing the breakpoint and clearing > the modifying_ftrace_code, we still need the smp_rmb() on the other > CPUS. The run_sync() does a IPI on all CPUs doing the smp_rmb(). Ah ok. My understanding was indeed that it doesn't ignore the ftrace trap, but I thought the consequence was that we return immediately from the trap handler. > > > > > > > > > Link: http://lkml.kernel.org/r/1393258342-29978-2-git-send-email-pmladek@xxxxxxx > > > > > > Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller" > > > Cc: stable@xxxxxxxxxxxxxxx # 3.5+ > > > Signed-off-by: Petr Mladek <pmladek@xxxxxxx> > > > Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx> > > > --- > > > arch/x86/kernel/ftrace.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c > > > index 6b566c8..69885e2 100644 > > > --- a/arch/x86/kernel/ftrace.c > > > +++ b/arch/x86/kernel/ftrace.c > > > @@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code, > > > ret = -EPERM; > > > goto out; > > > } > > > - run_sync(); > > > out: > > > + run_sync(); > > > return ret; > > > > > > fail_update: > > > > This could be further optimized by rather calling run_sync() in the end of the > > fail_update block (after the probe_kernel_write revert) otherwise even failure on > > setting the break will result in run_sync(), which doesn't appear to be needed. But > > that's really just nitpicking as it's a rare failure codepath and shouldn't hurt. > > No, the run_sync() must be done after removing the breakpoint. Again, > we don't want one of these breakpoints to be called on another CPU and > then see modifying_ftrace_code as zero. That is bad. The final > run_sync() is required. Ok but what I meant is to do this instead: fail_update: probe_kernel_write((void *)ip, &old_code[0], 1); + run_sync() goto out; Because with the current patch we also call run_sync() on add_break() failure. > > I think I'll update the change log to include my race flow graph from > above. > > -- Steve > > > > > > In any case, the fix looks correct. > > > > Acked-by: Frederic Weisbecker <fweisbec@xxxxxxxxx> > > > > > -- > > > 1.8.5.3 > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html