Re: Observation on NOHZ_FULL

Andrea Righi <andrea.righi@xxxxxxxxxxxxx> · Tue, 30 Jan 2024 07:58:18 +0100

Hi Joel and Paul,

comments below.

On Mon, Jan 29, 2024 at 05:16:38PM -0500, Joel Fernandes wrote:
> Hi Paul,
> 
> On 1/29/2024 3:41 PM, Paul E. McKenney wrote:
> > On Mon, Jan 29, 2024 at 05:47:39PM +0000, Joel Fernandes wrote:
> >> Hi Guys,
> >> Something caught my eye in [1] which a colleague pointed me to
> >>  - CONFIG_HZ=1000 : 14866.05 bogo ops/s
> >>  - CONFIG_HZ=1000+nohz_full : 18505.52 bogo ops/s
> >>
> >> The test in concern is:
> >> stress-ng --matrix $(getconf _NPROCESSORS_ONLN) --timeout 5m --metrics-brief
> >>
> >> which is a CPU intensive test.
> >>
> >> Any thoughts on what else can attribute a 30% performance increase
> >> versus non-nohz_full ? (Confession: No idea if the baseline is
> >> nohz_idle or no nohz at all). If it is 30%, I may want to evaluate
> >> nohz_full on some of our limited-CPU devices :)
> > 
> > The usual questions.  ;-)
> > 
> > Is this repeatable?  Is it under the same conditions of temperature,
> > load, and so on?  Was it running on bare metal or on a guest OS?  If on a
> > guest OS, what was the load from other guest OSes on the same hypervisor
> > or on the hypervisor itself?

That was the result of a quick test, so I expect it has some fuzzyness
in there.

It's an average of 10 runs, it was bare metal (my laptop, 8 cores 11th
Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), *but* I wanted to run the
test with the default Ubuntu settings, that means having "power mode:
balanced" enabled. I don't know exactly what it's doing (I'll check how
it works in details), I think it's using intel p-states IIRC.

Also, the system was not completely isolated (my email client was
running) but the system was mostly idle in general.

I was already planning to repeat the tests in a more "isolated"
environment and add details to the bug tracker.

> > 
> > The bug report ad "CONFIG_HZ=250 : 17415.60 bogo ops/s", which makes
> > me wonder if someone enabled some heavy debug that is greatly
> > increasing the overhead of the scheduling-clock interrupt.
> > 
> > Now, if that was the case, I would expect the 250HZ number to have
> > three-quarters of the improvement of the nohz_full number compared
> > to the 1000HZ number:
> >> 17415.60-14866.05=2549.55
> > 18505.52-14866.05=3639.47
> > 
> > 2549.55/3639.47=0.70
> 
> I wonder if the difference here could possibly also be because of CPU idle
> governor. It may behave differently at differently clock rates so perhaps has
> different overhead.

Could be, but, again, the balanced power mode could play a major role
here.

> 
> I have added trying nohz full to my list as well to evaluate. FWIW, when we
> moved from 250HZ to 1000HZ, it actually improved power because the CPUidle
> governor could put the CPUs in deeper idle states more quickly!

Interesting, another benefit to add to my proposal. :)

> 
> > OK, 0.70 is not *that* far off of 0.75.  So what debugging does that
> > test have enabled?  Also, if you use tracing (or whatever) to measure
> > the typical duration of the scheduling-clock interrupt and related things
> > like softirq handlers, does it fit with these numbers?  Such a measurment
> > would look at how long it took to get back into userspace.
> 
> Thanks for your detailed questions. I will add Andrea Righi to this list thread
> since he is the author of the bug report. Andrea do you mind clarifying a few
> things mentioned above? Also nice to see you are using CONFIG_RCU_LAZY for Ubuntu :)

Thanks for including me. Sorry that I didn't provide much details of my
tests.

And yes, I really want to see CONFIG_RCU_LAZY enabled in the stock
Ubuntu kernel, so the battery of my laptop lasts longer when I go to
conferences. :)

-Andrea

> 
> thanks,
> 
>  - Joel
> 
> 
> > 
> >> Cheers,
> >>
> >>  - Joel
> >>
> >> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2051342
> >