On Wed, 4 Mar 2020 13:39:41 -0800 Xi Wang <xii@xxxxxxxxxx> wrote: > The main purpose of kernel watchdog is to test whether scheduler can > still schedule tasks on a cpu. In order to reduce latency from > periodically invoking watchdog reset in thread context, we can simply > touch watchdog from pick_next_task in scheduler. Compared to actually > resetting watchdog from cpu stop / migration threads, we lose coverage > on: a migration thread actually get picked and we actually context > switch to the migration thread. Both steps are heavily protected by > kernel locks and unlikely to silently fail. Thus the change would > provide the same level of protection with less overhead. Have any measurements showing the drop in overhead? > > The new way vs the old way to touch the watchdogs is configurable > from: > > /proc/sys/kernel/watchdog_touch_in_thread_interval > > The value means: > 0: Always touch watchdog from pick_next_task > 1: Always touch watchdog from migration thread > N (N>0): Touch watchdog from migration thread once in every N > invocations, and touch watchdog from pick_next_task for > other invocations. > > Suggested-by: Paul Turner <pjt@xxxxxxxxxx> > Signed-off-by: Xi Wang <xii@xxxxxxxxxx> > --- > kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++++++-- > kernel/sysctl.c | 11 ++++++++++- > kernel/watchdog.c | 39 ++++++++++++++++++++++++++++++++++----- > 3 files changed, 78 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 1a9983da4408..9d8e00760d1c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3898,6 +3898,27 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt) > schedstat_inc(this_rq()->sched_count); > } > > +#ifdef CONFIG_SOFTLOCKUP_DETECTOR > + > +DEFINE_PER_CPU(bool, sched_should_touch_watchdog); > + > +void touch_watchdog_from_sched(void); > + > +/* Helper called by watchdog code */ > +void resched_for_watchdog(void) > +{ > + unsigned long flags; > + struct rq *rq = this_rq(); > + > + this_cpu_write(sched_should_touch_watchdog, true); Perhaps we should have a preempt_disable, otherwise it is possible to get preempted here. -- Steve > + raw_spin_lock_irqsave(&rq->lock, flags); > + /* Trigger resched for code in pick_next_task to touch watchdog */ > + resched_curr(rq); > + raw_spin_unlock_irqrestore(&rq->lock, flags); > +} > + > +#endif /* CONFIG_SOFTLOCKUP_DETECTOR */ > +