On Wed, Mar 4, 2020 at 11:57 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Wed, Mar 04, 2020 at 01:39:41PM -0800, Xi Wang wrote: > > The main purpose of kernel watchdog is to test whether scheduler can > > still schedule tasks on a cpu. In order to reduce latency from > > periodically invoking watchdog reset in thread context, we can simply > > touch watchdog from pick_next_task in scheduler. Compared to actually > > resetting watchdog from cpu stop / migration threads, we lose coverage > > on: a migration thread actually get picked and we actually context > > switch to the migration thread. Both steps are heavily protected by > > kernel locks and unlikely to silently fail. Thus the change would > > provide the same level of protection with less overhead. > > > > The new way vs the old way to touch the watchdogs is configurable > > from: > > > > /proc/sys/kernel/watchdog_touch_in_thread_interval > > > > The value means: > > 0: Always touch watchdog from pick_next_task > > 1: Always touch watchdog from migration thread > > N (N>0): Touch watchdog from migration thread once in every N > > invocations, and touch watchdog from pick_next_task for > > other invocations. > > > > This is configurable madness. What are we really trying to do here? See reply to Thomas, no config is actually required here. Focusing on the intended outcome: The goal is to improve jitter since we're constantly periodically preempting other classes to run the watchdog. Even on a single CPU this is measurable as jitter in the us range. But, what increases the motivation is this disruption has been recently magnified by CPU "gifts" which require evicting the whole core when one of the siblings schedules one of these watchdog threads. The majority outcome being asserted here is that we could actually exercise pick_next_task if required -- there are other potential things this will catch, but they are much more braindead generally speaking (e.g. a bug in pick_next_task itself).