Re: Observation on NOHZ_FULL

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Wed, 7 Feb 2024 10:48:10 -0500

On 2/6/2024 12:51 PM, Andrea Righi wrote:
>  - stress-ng --matrix seems quite unpredictable to be used a benchmarks
>    in this scenario (the bogo-ops/s are very susceptible to any kind of
>    interference, so even if in the long runs NO_HZ_FULL still seems to
>    provide some benefits looking at the average, we also need to
>    consider that there might be a significant error in the measurements,
>    standard deviation was pretty high)
> 

Ack on the bogo-ops disclaimers as also mentioned in the stress-ng docs. Agreed
a better metric for perf is helpful.

I am assuming you also have RCU_NOCB enabled for this test?

>  - fio doing short writes (in page cache) seems to perform like 2x
>    better in terms of iops with nohz_full, respect to the other cases
>    and it performs 2x slower with large IO writes (not sure why... need
>    to investigate more)

This is interesting, it could be worth counting how many kernel entries/exits
occur for large IO vs small IO. I'd imagine for large IO we have fewer syscalls
and hence lower entry/exit overhead. But if there more interrupts for whatever
reason with large IO, then that also implies more kernel entries/exits. As
Frederic was saying, NOHZ_FULL has higher overhead on kernel entry/exit.

> 
>  - with lazy RCU enabled hrtimer_interrupt() takes like 2x more to
>    return, respect to the other cases (is this expected?)

It depends on which hrtimer_interrupt() instance? There must be several in the
trace due to unrelated timers. Are you saying the worst case or it is always 2x
more? We do queue a timer for Lazy RCU to flush the RCU work but it is set to 10
seconds and should be canceled most of the time (Its just a guard rail). It is
possible there is lock contention on ->nocb_gp_lock which is causing the timer
handler execution to be slow. We have several trace_rcu_nocb* trace points,
including for the timer. Perhaps you could enable those and we dig deeper?

Further, it is interesting to see if it is only the hrtimer_interrupt() instance
that also results in a call to do_nocb_deferred_wakeup_timer() via say function
tracing. That will confirm that it is the lazy timer that is slow for you.

The actual number of callbacks should not be causing specifically the
hrtimer_interrupt() to take too long to run, AFAICS. But RCU's lazy feature does
increase the number of timer interrupts.

Further still, it depends on how much hrtimer_interrupt() takes with lazy RCU to
call it a problem IMO. Some numbers with units will be nice.

thanks,

 - Joel