Re: Observation on NOHZ_FULL

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Mon, 29 Jan 2024 17:16:38 -0500

Hi Paul,

On 1/29/2024 3:41 PM, Paul E. McKenney wrote:
> On Mon, Jan 29, 2024 at 05:47:39PM +0000, Joel Fernandes wrote:
>> Hi Guys,
>> Something caught my eye in [1] which a colleague pointed me to
>>  - CONFIG_HZ=1000 : 14866.05 bogo ops/s
>>  - CONFIG_HZ=1000+nohz_full : 18505.52 bogo ops/s
>>
>> The test in concern is:
>> stress-ng --matrix $(getconf _NPROCESSORS_ONLN) --timeout 5m --metrics-brief
>>
>> which is a CPU intensive test.
>>
>> Any thoughts on what else can attribute a 30% performance increase
>> versus non-nohz_full ? (Confession: No idea if the baseline is
>> nohz_idle or no nohz at all). If it is 30%, I may want to evaluate
>> nohz_full on some of our limited-CPU devices :)
> 
> The usual questions.  ;-)
> 
> Is this repeatable?  Is it under the same conditions of temperature,
> load, and so on?  Was it running on bare metal or on a guest OS?  If on a
> guest OS, what was the load from other guest OSes on the same hypervisor
> or on the hypervisor itself?
> 
> The bug report ad "CONFIG_HZ=250 : 17415.60 bogo ops/s", which makes
> me wonder if someone enabled some heavy debug that is greatly
> increasing the overhead of the scheduling-clock interrupt.
> 
> Now, if that was the case, I would expect the 250HZ number to have
> three-quarters of the improvement of the nohz_full number compared
> to the 1000HZ number:
>> 17415.60-14866.05=2549.55
> 18505.52-14866.05=3639.47
> 
> 2549.55/3639.47=0.70

I wonder if the difference here could possibly also be because of CPU idle
governor. It may behave differently at differently clock rates so perhaps has
different overhead.

I have added trying nohz full to my list as well to evaluate. FWIW, when we
moved from 250HZ to 1000HZ, it actually improved power because the CPUidle
governor could put the CPUs in deeper idle states more quickly!

> OK, 0.70 is not *that* far off of 0.75.  So what debugging does that
> test have enabled?  Also, if you use tracing (or whatever) to measure
> the typical duration of the scheduling-clock interrupt and related things
> like softirq handlers, does it fit with these numbers?  Such a measurment
> would look at how long it took to get back into userspace.

Thanks for your detailed questions. I will add Andrea Righi to this list thread
since he is the author of the bug report. Andrea do you mind clarifying a few
things mentioned above? Also nice to see you are using CONFIG_RCU_LAZY for Ubuntu :)

thanks,

 - Joel

> 
>> Cheers,
>>
>>  - Joel
>>
>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2051342
>