On Wed, Sep 15, 2021 at 11:41:42AM -0700, Linus Torvalds wrote: > On Wed, Sep 15, 2021 at 11:31 AM Frederic Weisbecker > <frederic@xxxxxxxxxx> wrote: > > > > Right, this should fix the issue: https://lore.kernel.org/lkml/20210913145332.232023-1-frederic@xxxxxxxxxx/ > > Hmm. > > Can you explain why the fix isn't just to revert that original commit? > > It looks like the only real difference is that now it does *extra > work* with all that tick_nohz_dep_set_signal(). > > Isn't it easier to just leave any old timer ticking, and not do the > extra work until it expires and you notice "ok, it's not important"? > > IOW, that original commit explicitly broke the only case it changed - > the timer being disabled. So why isn't it just reverted? What is it > that kleeps us wanting to do the extra work for the disabled timer > case? > > As long as it's fixed, I'm all ok with this, but I'm looking at the > commit message for that broken commit, and I'm looking at the commit > message for the fix, and I'm not seeing an actual _explanation_ for > this churn. The commit indeed failed to explain correctly the actual issue. When a process wide posix cpu timer (eg: itimer) is elapsing, all the threads inside that process contend on their cputime updates (account_group_user_time() and account_group_system_time()) The overhead just consists in concurrent atomic64_add() calls on every tick but still... And this can remain for a very long while, until the previous value of the timer expiry is reached. The other symptom, more of a corner case for most, is that the CPUs running any thread of that process won't be able to enter in nohz_full mode, again until the old timer expiry is reached.