On Fri, 2013-08-02 at 14:41 -0400, Alan Stern wrote: > Steve: > > Here's another irqsoff trace collected by James. It's not as extreme > as before, but I don't know how to interpret the 875-us jump in the > time value. Could this be a result of the timer being adjusted? > > Also, what's the explanation for these two lines: > > URL Clas-3286 2d... 881us : jiffies_to_timeval <-__acct_update_integrals > URL Clas-3286 2d.h. 882us : exit_idle <-smp_apic_timer_interrupt > > They appear to say that the CPU got a hardirq while interrupts were > disabled. How can that be? If you have NMIs going off, that is probably what's causing issues. > > Alan Stern > > > # tracer: irqsoff > # > # irqsoff latency trace v1.1.5 on 3.10.0-ver5 > # -------------------------------------------------------------------- > # latency: 906 us, #93/93, CPU#2 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) > # ----------------- > # | task: URL Classifier-3286 (uid:1000 nice:0 policy:0 rt_prio:0) > # ----------------- > # => started at: perf_event_update_userpage I'm guessing that it enabled the tracing with perf in the NMI. But NMI wont tell the tracer it ended :-( > # => ended at: retint_swapgs > # > # > # _------=> CPU# > # / _-----=> irqs-off > # | / _----=> need-resched > # || / _---=> hardirq/softirq > # ||| / _--=> preempt-depth > # |||| / delay > # cmd pid ||||| time | caller > # \ / ||||| \ | / > URL Clas-3286 2d.h. 1us : local_clock <-perf_event_update_userpage > URL Clas-3286 2d.h. 2us : watchdog_overflow_callback <-__perf_event_overflow > URL Clas-3286 2d.h. 3us : arch_trigger_all_cpu_backtrace_handler <-nmi_handle.isra.0 > URL Clas-3286 2d.h. 3us : perf_ibs_nmi_handler <-nmi_handle.isra.0 > URL Clas-3286 2d.h. 3us : perf_ibs_handle_irq <-perf_ibs_nmi_handler > URL Clas-3286 2d.h. 4us : perf_ibs_handle_irq <-perf_ibs_nmi_handler > URL Clas-3286 2d.h. 4us!: rcu_nmi_exit <-do_nmi The tracer started within the NMI, but out of sheer (bad) luck. That's because the NMI code has no logic to handle tracing interrupts on or off, due to the problems they cause. We may at most be able to just ignore all NMIs by adding a 'in_nmi()' check. I can send you a patch if you want. The other thing you can do is not run perf while this is going on. > URL Clas-3286 2d... 879us : smp_apic_timer_interrupt <-apic_timer_interrupt Yes, we can have interrupts happen here. We may have even went into userspace, as the return from NMI never told the tracer that we are enabling interrupts again. Thus, the enable interrupt is what was missed. -- Steve > URL Clas-3286 2d... 879us : irq_enter <-smp_apic_timer_interrupt > URL Clas-3286 2d... 879us : rcu_irq_enter <-irq_enter > URL Clas-3286 2d... 879us : rcu_eqs_exit_common.isra.45 <-rcu_irq_enter > URL Clas-3286 2d... 880us : vtime_account_irq_enter <-irq_enter > URL Clas-3286 2d... 880us : vtime_account_user <-vtime_account_irq_enter > URL Clas-3286 2d... 880us : get_vtime_delta <-vtime_account_user > URL Clas-3286 2d... 880us : _raw_spin_lock <-vtime_account_user -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html