Re: Ftrace for MIPS may hang on SMP system

Steven Rostedt <rostedt@xxxxxxxxxxx> · Mon, 30 Aug 2010 16:47:38 -0400

Sorry for the late reply, just got back from vacation.

On Sun, 2010-08-22 at 22:27 +0800, wu zhangjin wrote:
> On 8/22/10, wu zhangjin <wuzhangjin@xxxxxxxxx> wrote:
> > (Add 'another' Steven in this loop)
> >
> > On 8/22/10, wu zhangjin <wuzhangjin@xxxxxxxxx> wrote:
> >> Hi, all
> >>
> >> For I didn't have a SMP machine, I haven't used Ftrace(in 2.6.34) for
> >> MIPS on SMP system before, Yesterday,  I got a RMI XLS machine and
> >> found Ftrace for MIPS hanged on it after I issued:
> >>
> >> $ echo function > /debug/tracing/current_tracer
> >>
> >> I have gotten the root cause, that is:
> >>
> >> in kernel/trace/ftrace.c:
> >>
> >> stop_machine() disables the irqs of the other cpus and then modify the
> >> codes via calling the arch specific ftrace_modify_code() in
> >> __ftrace_modify_code().
> >>
> >> As the description about stop_machine() in arch/x86/kernel/ftrace.c
> >> shows:
> >>
> >> /*
> >>  * Modifying code must take extra care. On an SMP machine, if
> >>  * the code being modified is also being executed on another CPU
> >>  * that CPU will have undefined results and possibly take a GPF.
> >>  * We use kstop_machine to stop other CPUS from exectuing code.
> >> [snip]
> >>
> >> Then, it is reasonable to use stop_machine() here.
> >>
> >> And in arch/mips/kernel/ftrace.c:
> >>
> >> flush_icache_range() is called in ftrace_modify_code() to ensure the
> >> intructions will be executed are what we want.
> >>
> >> In UP system, there is no problem for flush_icache_range() simply
> >> flush the instruction cache, but In SMP system, this may be different,
> >> for flush_icache_range() may also need to ask the other cpus (via
> >> sending ipi interrupt) to flush their icaches and will wait for them
> >> till the other cpus finish their flushing.
> >>
> >> But as we know above, the irqs of the other cpus are disabled by
> >> stop_machine(), they have no opportunity to flush their icache and
> >> will let the current cpu wait for them all the time, then soft lock
> >> --> hang.
> >>
> >> To fix it, there are two potential solutions:
> >>
> >> 1. replace flush_icache_range() by something else, maybe we can use
> >> the similar method in arch/x86/kernel/ftrace.c, x86 uses sync_core()
> >> defined in arch/x86/include/asm/processor.h to flush the icache on all
> >> processors:
> >>
> >> /* Stop speculative execution and prefetching of modified code. */
> >> static inline void sync_core(void)
> >> {
> >>         int tmp;
> >>
> >> #if defined(CONFIG_M386) || defined(CONFIG_M486)
> >>         if (boot_cpu_data.x86 < 5)
> >>                 /* There is no speculative execution.
> >>                  * jmp is a barrier to prefetching. */
> >>                 asm volatile("jmp 1f\n1:\n" ::: "memory");
> >>         else
> >> #endif
> >>                 /* cpuid is a barrier to speculative execution.
> >>                  * Prefetched instructions are automatically
> >>                  * invalidated when modified. */
> >>                 asm volatile("cpuid" : "=a" (tmp) : "0" (1)
> >>                              : "ebx", "ecx", "edx", "memory");
> >> }
> >>
> >> But is there a cpuid like hardware instruction in MIPS SMP? As I know,
> >> in UP, we may be possible to use prefetch instruction to push the
> >> instruction to the cache, but in SMP, is there a instruction to force
> >> the other cpus to flush their cache too?
> >>
> >> 2. Replace the stop_machine() by something else
> >>
> >> I have written such a patch:
> >>
> >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> >> index 2404b59..e4d058f 100644
> >> --- a/kernel/trace/ftrace.c
> >> +++ b/kernel/trace/ftrace.c
> >> @@ -1129,13 +1129,18 @@ static int __ftrace_modify_code(void *data)
> >>  static void ftrace_run_update_code(int command)
> >>  {
> >>         int ret;
> >> +       unsigned long flags;
> >>
> >>         ret = ftrace_arch_code_modify_prepare();
> >>         FTRACE_WARN_ON(ret);
> >>         if (ret)
> >>                 return;
> >>
> >> -       stop_machine(__ftrace_modify_code, &command, NULL);
> >> +       preempt_disable();
> >> +       local_irq_save(flags);
> >> +       __ftrace_modify_code(&command);
> >> +       local_irq_restore(flags);
> >> +       preempt_enable();
> >>
> >>         ret = ftrace_arch_code_modify_post_process();
> >>         FTRACE_WARN_ON(ret);
> >>
> 
> We may need to protect the __ftrace_modify_code() with raw spin lock.

Nope, this will still crash a x86 box. The problem is not with
synchronizing the modification of code, but with modifying code that may
be executing on another CPU.

If the code is being entered in the pipe on one CPU, and another CPU
happens to modify that code (especially in kernel mode), the executing
CPU may take a general protection fault.

I believe this is true with some PPC's as well, so it is not a x86 only
problem.

It may also be a problem with MIPS too.

-- Steve