On Sun, Aug 20, 2017 at 11:00:40PM +1000, Nicholas Piggin wrote: > On Sun, 20 Aug 2017 14:45:53 +1000 > Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > > On Wed, 16 Aug 2017 09:27:31 -0700 > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > > On Wed, Aug 16, 2017 at 05:56:17AM -0700, Paul E. McKenney wrote: > > > > > > Thomas, John, am I misinterpreting the timer trace event messages? > > > > So I did some digging, and what you find is that rcu_sched seems to do a > > simple scheudle_timeout(1) and just goes out to lunch for many seconds. > > The process_timeout timer never fires (when it finally does wake after > > one of these events, it usually removes the timer with del_timer_sync). > > > > So this patch seems to fix it. Testing, comments welcome. > > Okay this had a problem of trying to forward the timer from a timer > callback function. > > This was my other approach which also fixes the RCU warnings, but it's > a little more complex. I reworked it a bit so the mod_timer fast path > hopefully doesn't have much more overhead (actually by reading jiffies > only when needed, it probably saves a load). Giving this one a whirl! Thanx, Paul > Thanks, > Nick > > -- > [PATCH] timers: Fix excessive granularity of new timers after a nohz idle > > When a timer base is idle, it is forwarded when a new timer is added to > ensure that granularity does not become excessive. When not idle, the > timer tick is expected to increment the base. > > However there is a window after a timer is restarted from nohz, when it > is marked not-idle, and before the timer tick on this CPU, where a timer > may be added on an ancient base that does not get forwarded (beacause > the timer appears not-idle). > > This results in excessive granularity. So much so that a 1 jiffy timeout > has blown out to 10s of seconds and triggered the RCU stall warning > detector. > > Fix this by keeping track of whether the timer has been idle since it was > last run or forwarded, and allow forwarding in the case that is true (even > if it is not currently idle). > > Also add a comment noting a case where we could get an unexpectedly > large granularity for a timer. I debugged this problem by adding > warnings for such cases, but it seems we can't add them in general due > to this corner case. > > Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> > --- > kernel/time/timer.c | 32 +++++++++++++++++++++++++++----- > 1 file changed, 27 insertions(+), 5 deletions(-) > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index 8f5d1bf18854..ee7b8b688b48 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -203,6 +203,7 @@ struct timer_base { > bool migration_enabled; > bool nohz_active; > bool is_idle; > + bool was_idle; > DECLARE_BITMAP(pending_map, WHEEL_SIZE); > struct hlist_head vectors[WHEEL_SIZE]; > } ____cacheline_aligned; > @@ -856,13 +857,19 @@ get_target_base(struct timer_base *base, unsigned tflags) > > static inline void forward_timer_base(struct timer_base *base) > { > - unsigned long jnow = READ_ONCE(jiffies); > + unsigned long jnow; > > /* > - * We only forward the base when it's idle and we have a delta between > - * base clock and jiffies. > + * We only forward the base when we are idle or have just come out > + * of idle (was_idle logic), and have a delta between base clock > + * and jiffies. In the common case, run_timers will take care of it. > */ > - if (!base->is_idle || (long) (jnow - base->clk) < 2) > + if (likely(!base->was_idle)) > + return; > + > + jnow = READ_ONCE(jiffies); > + base->was_idle = base->is_idle; > + if ((long)(jnow - base->clk) < 2) > return; > > /* > @@ -938,6 +945,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) > * same array bucket then just return: > */ > if (timer_pending(timer)) { > + /* > + * The downside of this optimization is that it can result in > + * larger granularity than you would get from adding a new > + * timer with this expiry. Would a timer flag for networking > + * be appropriate, then we can try to keep expiry of general > + * timers within ~1/8th of their interval? > + */ > if (timer->expires == expires) > return 1; > > @@ -1499,8 +1513,10 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) > /* > * If we expect to sleep more than a tick, mark the base idle: > */ > - if ((expires - basem) > TICK_NSEC) > + if ((expires - basem) > TICK_NSEC) { > + base->was_idle = true; > base->is_idle = true; > + } > } > raw_spin_unlock(&base->lock); > > @@ -1587,6 +1603,12 @@ static inline void __run_timers(struct timer_base *base) > struct hlist_head heads[LVL_DEPTH]; > int levels; > > + /* > + * was_idle must be cleared before running timers so that any timer > + * functions that call mod_timer will not try to forward the base. > + */ > + base->was_idle = false; > + > if (!time_after_eq(jiffies, base->clk)) > return; > > -- > 2.13.3 > -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html