On 8/22/10, wu zhangjin <wuzhangjin@xxxxxxxxx> wrote: > (Add 'another' Steven in this loop) > > On 8/22/10, wu zhangjin <wuzhangjin@xxxxxxxxx> wrote: >> Hi, all >> >> For I didn't have a SMP machine, I haven't used Ftrace(in 2.6.34) for >> MIPS on SMP system before, Yesterday, I got a RMI XLS machine and >> found Ftrace for MIPS hanged on it after I issued: >> >> $ echo function > /debug/tracing/current_tracer >> >> I have gotten the root cause, that is: >> >> in kernel/trace/ftrace.c: >> >> stop_machine() disables the irqs of the other cpus and then modify the >> codes via calling the arch specific ftrace_modify_code() in >> __ftrace_modify_code(). >> >> As the description about stop_machine() in arch/x86/kernel/ftrace.c >> shows: >> >> /* >> * Modifying code must take extra care. On an SMP machine, if >> * the code being modified is also being executed on another CPU >> * that CPU will have undefined results and possibly take a GPF. >> * We use kstop_machine to stop other CPUS from exectuing code. >> [snip] >> >> Then, it is reasonable to use stop_machine() here. >> >> And in arch/mips/kernel/ftrace.c: >> >> flush_icache_range() is called in ftrace_modify_code() to ensure the >> intructions will be executed are what we want. >> >> In UP system, there is no problem for flush_icache_range() simply >> flush the instruction cache, but In SMP system, this may be different, >> for flush_icache_range() may also need to ask the other cpus (via >> sending ipi interrupt) to flush their icaches and will wait for them >> till the other cpus finish their flushing. >> >> But as we know above, the irqs of the other cpus are disabled by >> stop_machine(), they have no opportunity to flush their icache and >> will let the current cpu wait for them all the time, then soft lock >> --> hang. >> >> To fix it, there are two potential solutions: >> >> 1. replace flush_icache_range() by something else, maybe we can use >> the similar method in arch/x86/kernel/ftrace.c, x86 uses sync_core() >> defined in arch/x86/include/asm/processor.h to flush the icache on all >> processors: >> >> /* Stop speculative execution and prefetching of modified code. */ >> static inline void sync_core(void) >> { >> int tmp; >> >> #if defined(CONFIG_M386) || defined(CONFIG_M486) >> if (boot_cpu_data.x86 < 5) >> /* There is no speculative execution. >> * jmp is a barrier to prefetching. */ >> asm volatile("jmp 1f\n1:\n" ::: "memory"); >> else >> #endif >> /* cpuid is a barrier to speculative execution. >> * Prefetched instructions are automatically >> * invalidated when modified. */ >> asm volatile("cpuid" : "=a" (tmp) : "0" (1) >> : "ebx", "ecx", "edx", "memory"); >> } >> >> But is there a cpuid like hardware instruction in MIPS SMP? As I know, >> in UP, we may be possible to use prefetch instruction to push the >> instruction to the cache, but in SMP, is there a instruction to force >> the other cpus to flush their cache too? >> >> 2. Replace the stop_machine() by something else >> >> I have written such a patch: >> >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >> index 2404b59..e4d058f 100644 >> --- a/kernel/trace/ftrace.c >> +++ b/kernel/trace/ftrace.c >> @@ -1129,13 +1129,18 @@ static int __ftrace_modify_code(void *data) >> static void ftrace_run_update_code(int command) >> { >> int ret; >> + unsigned long flags; >> >> ret = ftrace_arch_code_modify_prepare(); >> FTRACE_WARN_ON(ret); >> if (ret) >> return; >> >> - stop_machine(__ftrace_modify_code, &command, NULL); >> + preempt_disable(); >> + local_irq_save(flags); >> + __ftrace_modify_code(&command); >> + local_irq_restore(flags); >> + preempt_enable(); >> >> ret = ftrace_arch_code_modify_post_process(); >> FTRACE_WARN_ON(ret); >> We may need to protect the __ftrace_modify_code() with raw spin lock. >> It works without any hang but I'm not sure whether it will guarantee >> the "undefined results" problem mentioned above. Here we may need to >> prevent the other cpus from executing the source code for we are >> modifying the source code but also need to allow them to get the ipi >> interrupt and flush their icaches. >> >> And I have took a look at the part of code modification in kgdb >> system, seems it doesn't use stop_machine(). >> >> What's your ideas? >> >> Thanks & Regards, >> Wu Zhangjin >> > > > -- > MSN+Gtalk: wuzhangjin@xxxxxxxxx > Blog: http://falcon.oss.lzu.edu.cn > Tel:+86-18710032278 > -- MSN+Gtalk: wuzhangjin@xxxxxxxxx Blog: http://falcon.oss.lzu.edu.cn Tel:+86-18710032278