On Mon, Sep 26, 2016 at 07:55:13PM -0400, Thomas Gleixner wrote: > On Mon, 26 Sep 2016, Rich Felker wrote: > > On Mon, Sep 26, 2016 at 11:27:14PM +0200, Daniel Lezcano wrote: > > Based on use of ftrace, I was able to see situations where a second > > timer hardirq happened immediately after one occurred, before the > > timer softirq could run. My theory is that this is causing some kind > > of feedback loop where new timer expirations keep getting scheduled > > with a very short interval such that the softirq never gets to run > > (until other interrupt activity disrups the feedback loop). I tried > > reverting 4e85876a9d2a977b4a07389da8c07edf76d10825 which seemed > > relevant and it didn't help, but on further review (right now) there > > seem to be a few related commits just before it that might be > > responsible for the regression. I'll see if I can dig up anything else > > useful. > > Interesting theory, but lets look at the data from the other thread: > > <idle>-0 [001] d.h. 2646.702790: irq_handler_entry: irq=72 name=jcore_pit > <idle>-0 [001] d.h. 2646.703398: softirq_raise: vec=1 [action=TIMER] > <idle>-0 [001] d.h. 2646.703607: softirq_raise: vec=9 [action=RCU] > <idle>-0 [001] d.h. 2646.703884: softirq_raise: vec=7 [action=SCHED] > <idle>-0 [001] d.h. 2646.704191: irq_handler_exit: irq=72 ret=handled > > So it takes 1.4ms to finish the timer irq handler > > <idle>-0 [001] d.H. 2646.704509: irq_handler_entry: irq=72 name=jcore_pit > <idle>-0 [001] d.H. 2646.704990: softirq_raise: vec=1 [action=TIMER] > <idle>-0 [001] d.H. 2646.705182: softirq_raise: vec=9 [action=RCU] > <idle>-0 [001] d.H. 2646.705465: softirq_raise: vec=7 [action=SCHED] > <idle>-0 [001] d.H. 2646.705761: irq_handler_exit: irq=72 ret=handled > > And 1.2ms in this case > > > <idle>-0 [001] d.H. 2646.706071: irq_handler_entry: irq=72 name=jcore_pit > <idle>-0 [001] d.H. 2646.706328: irq_handler_exit: irq=72 ret=handled > > So this one is actually short enough that the soft interrupt can excute and > obviously no new timer is scheduled close enough. > > <idle>-0 [001] ..s. 2646.706530: softirq_entry: vec=1 [action=TIMER] > > Now the obvious question arises: Why is that handler taking that long? And I'm not sure. Even at our clock rate of 50 MHz, 1.2ms is 60000 cycles. It's possible much of that overhead is coming from ftrace. I can play around again with just logging the minimum intervals that are programmed, and not using ftrace. > sure the next event being programmed shortely after the other has to do > with various factors (HZ setting, High-resolution timers enabled?). W/o > having the corresponding hrtimer tracepoints available (start and expire) > it's hard to tell what's going on. I do have hrtimers enabled. If it would be helpful I could try again with traces for them enabled too. > But I can tell for sure that there is no feedback loop and its not > disrupted by some other interrupt. FYI no rcu_sched messages or noticable stalls were observed with tracing active. The cpu load from tracing is so high that I would not expect to see the same behavior. I probably failed to make that clear before; sorry. > The above trace is merily seing the PIT > irq and nothing else. It is showing another one happening before the softirq side of the first could run, right? Of course this is probably due to overall ftrace slowness. CONFIG_DYNAMIC_FTRACE does not work on big-endian sh (the objdump needed at build time is hard-coded to assume sh is little-endian and I haven't yet worked out what all needs to be fixed) so I suspect that may be the cause of ftrace being so slow when it's turned on. Oddly, the min_delta cutoff needed to prevent the stalls was on the same order of magnitude as these numbers (2.5ms almost entirely made them go away; 5ms completely). > Can you please provide a trace with hrtimer tracepoints enabled and tell us > the settings of CONFIG_HZ and whether high resolution timers are enabled or > not. Yes. I think these are the relevant config items: CONFIG_HZ=100 CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y Let me know if I missed any. I'll try getting a trace and follow up again. > That workaround of setting min_delta to some absurd big number looks fishy > at best. I agree. I'd like to find the right fix, but usability badly suffers without either a proper fix or a workaround. Rich -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html