On Wed, 16 Aug 2017 05:56:17 -0700 "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > On Wed, Aug 16, 2017 at 10:43:52PM +1000, Michael Ellerman wrote: > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes: > > ... > > > > > > commit 33103e7b1f89ef432dfe3337d2a6932cdf5c1312 > > > Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > > Date: Mon Aug 14 08:54:39 2017 -0700 > > > > > > EXP: Trace tick return from tick_nohz_stop_sched_tick > > > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > index c7a899c5ce64..7358a5073dfb 100644 > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -817,6 +817,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, > > > * (not only the tick). > > > */ > > > ts->sleep_length = ktime_sub(dev->next_event, now); > > > + trace_printk("tick_nohz_stop_sched_tick: %lld\n", (tick - ktime_get()) / 1000); > > > return tick; > > > } > > > > Should I be seeing negative values? A small sample: > > Maybe due to hypervisor preemption delays, but I confess that I am > surprised to see them this large. 1,602,250,019 microseconds is something > like a half hour, which could result in stall warnings all by itself. > > > <idle>-0 [015] d... 1602.039695: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602250019 > > <idle>-0 [009] d... 1602.039701: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602250025 > > <idle>-0 [007] d... 1602.039702: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602250025 > > <idle>-0 [048] d... 1602.039703: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: 9973 > > <idle>-0 [006] d... 1602.039704: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602250027 > > <idle>-0 [001] d... 1602.039730: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602250053 > > <idle>-0 [008] d... 1602.039732: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602250055 > > <idle>-0 [006] d... 1602.049695: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602260018 > > <idle>-0 [009] d... 1602.049695: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602260018 > > <idle>-0 [001] d... 1602.049695: __tick_nohz_idle_enter: tick_nohz_stop_sched_tick: -1602260018 > > > > > > I have a full trace, I'll send it to you off-list. > > I will take a look! I found this, I can't see that it would cause our symptoms, but it's worth someone who knows the code taking a look at it. -- cpuidle: fix broadcast control when broadcast can not be entered When failing to enter broadcast timer mode for an idle state that requires it, a new state is selected that does not require broadcast, but the broadcast variable remains set. This causes tick_broadcast_exit to be called despite not having entered broadcast mode. This causes the WARN_ON_ONCE(!irqs_disabled()) to trigger in some cases, but otherwise does not appear to cause problems. Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> --- drivers/cpuidle/cpuidle.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 60bb64f4329d..4453e27f855e 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -208,6 +208,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, return -EBUSY; } target_state = &drv->states[index]; + broadcast = false; } /* Take note of the planned idle state. */ -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html