On Sun, May 07, 2023 at 10:08:52AM -0400, Steven Rostedt wrote: > > [ Added Anna-Maria who is doing some timer work as well ] > > On Sun, 7 May 2023 11:07:00 +0200 > Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote: > > > Overview: > > > > nohz_full is a feature that allows to reduce the number of CPU tick > > interrupts, thereby improving energy efficiency and reducing kernel > > jitter. > > Hmm, I never thought of NOHZ_FULL used for energy efficiency, as the > CPU is still running user space code, and there's really nothing > inherently more power consuming with the tick. The idea here was to try to reduce the tick also on the timekeeping CPU to have more idle time (because at least 1 CPU is periodically ticking with nohz_full=all). But my patch was mostly a toy patch and the real purpose was really to get some advices/guidance on the tick/nohz topic. > > > > > This works by stopping the tick interrupts on the CPUs that are either > > idle or that have only one runnable task on them (there is no reason to > > periodically interrupt the execution of a single running task if none > > else is waiting to acquire the same CPU). > > > > It is not possible to configure all the available CPUs to work in the > > nohz_full mode, at least one non-adaptive-tick CPU must be periodically > > interrupted to properly handle timekeeping tasks in the system (such as > > the gettimeofday() syscall returning accurate values). > > Do we really need nohz_full, instead, I think you want to look at what > Anna-Maria is doing with moving the timer "manager" around to make sure > that the tick stays on busy CPUs. > > Again, nohz_full is not for power consumption savings, but instead to > reduce kernel interruption in user space. Will definitely look at Anna-Maria's work. > > > > > However, under certain conditions, we may want to relax this constraint, > > accepting potential time inaccuracies in the system, in order to provide > > additional benefits in terms of power consumption, performance and/or > > reduce kernel jitter even more. > > > > For this reason introduce the new parameter nohz_full_aggressive. > > > > This option allows to enforce nozh_full across all the CPUs (even the > > timekeeping CPU) at the cost of having potential timer inaccuracies in > > the system. > > > > Test: > > > > - Hardware: Dell XPS 13 7390 w/ 8 cores > > > > - Kernel is using CONFIG_HZ=1000 (worst case scenario in terms of > > power consumption and kernel jitter) and nohz_full=all > > > > - Measure interrupts and power consumption when the system is idle and > > with 2, 4 and 8 cpu hogs > > > > Result: > > > > The following numbers have been collected using turbostat and dstat > > measuring the average over a 5min run for each test. > > > > irqs/sec idle 1 CPU hog 2 CPU hogs 4 CPU hogs 8 CPU hogs > > ------------------------------------------------------ > > nohz_full 1036.679 1047.522 1046.203 1048.590 1074.867 > > nohz_full_aggressive 98.685 106.296 127.587 146.586 1062.277 > > > > Power (Watt) idle 1 CPU hog 2 CPU hogs 4 CPU hogs 8 CPU hogs > > ------------------------------------------------------ > > nohz_full 0.502 W 3.436 W 3.755 W 6.187 W 6.019 W > > nohz_full_aggressive 0.301 W 2.372 W 2.372 W 6.005 W 6.016 W > > > > % power reduction 40.04% 30.97% 36.83% 2.94% 0.05% > > > > Nice. > > Now I doubt this is acceptable considering the side effects that the > timer inaccuracy can cause. I think this breaks some basic assumptions > in both the kernel and user space. I've been running this nohz_full_aggressive patch for some days on my laptop without any evident side effect, but I'm pretty sure it can break something, considering that timing potentially can become totally unreliable. I was also wondering if we could try to implement a kind of dynamic HZ scaling (like scaling HZ up/down dynamically at runtime or even at boot time), but it seems quite complicated (and scary, especially looking at the code in jiffies / timers, i.e. all the constants in ./kernel/time/timeconst.bc). I remember there used to be a dynamic-hz patch a long long time ago by Andrea Arcangeli, but I couldn't find any recent work on this topic. > > Now, I think what is really happening here is that you are somewhat > simulating the results that Anna-Maria has indirectly. That is, you > just prevent an idle CPU from waking up to handle interrupts when not > needed. > > Anna-Maria, > > Do you have some patches that Andrea could test with? > > Thanks, > > -- Steve Thanks for looking at this (and I'm happy to help Anna-Maria with any test). -Andrea