On Mon, Oct 5, 2020 at 4:19 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Fri, Mar 06, 2020 at 02:34:20PM -0800, Xi Wang wrote: > > On Fri, Mar 6, 2020 at 12:40 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > On Thu, Mar 05, 2020 at 02:11:49PM -0800, Paul Turner wrote: > > > > The goal is to improve jitter since we're constantly periodically > > > > preempting other classes to run the watchdog. Even on a single CPU > > > > this is measurable as jitter in the us range. But, what increases the > > > > motivation is this disruption has been recently magnified by CPU > > > > "gifts" which require evicting the whole core when one of the siblings > > > > schedules one of these watchdog threads. > > > > > > > > The majority outcome being asserted here is that we could actually > > > > exercise pick_next_task if required -- there are other potential > > > > things this will catch, but they are much more braindead generally > > > > speaking (e.g. a bug in pick_next_task itself). > > > > > > I still utterly hate what the patch does though; there is no way I'll > > > have watchdog code hook in the scheduler like this. That's just asking > > > for trouble. > > > > > > Why isn't it sufficient to sample the existing context switch counters > > > from the watchdog? And why can't we fix that? > > > > We could go to pick next and repick the same task. There won't be a > > context switch but we still want to hold the watchdog. I assume such a > > counter also needs to be per cpu and inside the rq lock. There doesn't > > seem to be an existing one that fits this purpose. > > Sorry, your reply got lost, but I just ran into something that reminded > me of this. > > There's sched_count. That's currently schedstat, but if you can find a > spot in a hot cacheline (from schedule()'s perspective) then it > should be cheap to incremenent unconditionally. > > If only someone were to write a useful cacheline perf tool (and no that > c2c trainwreck doesn't count). > Thanks, I'll try the alternative implementation. -Xi