Re: Observation on NOHZ_FULL

Andrea Righi <andrea.righi@xxxxxxxxxxxxx> · Wed, 7 Feb 2024 18:05:55 +0100

On Wed, Feb 07, 2024 at 11:52:35AM -0500, Joel Fernandes wrote:
> On Wed, Feb 7, 2024 at 11:31 AM Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote:
> 
> > > The actual number of callbacks should not be causing specifically the
> > > hrtimer_interrupt() to take too long to run, AFAICS. But RCU's lazy feature does
> > > increase the number of timer interrupts.
> > >
> > > Further still, it depends on how much hrtimer_interrupt() takes with lazy RCU to
> > > call it a problem IMO. Some numbers with units will be nice.
> >
> > This is what I see (this is a single run, but the other runs are
> > similar), unit is nanosec, with lazy RCU enabled hrtimer_interrupt()
> > takes around 4K-16K ns, with lazy RCU off most of the times it takes
> > 2K-4K ns:
> >
> >  - lazy rcu off:
> >
> > [1K, 2K)         88307 |@@@@@@@@@@@@                                            |
> > [2K, 4K)        380695 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
> > [4K, 8K)           194 |                                                        |
> >
> >  - lazy rcu on:
> >
> > [2K, 4K)          3094 |                                                          |
> > [4K, 8K)        265763 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > [8K, 16K)       182341 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
> > [16K, 32K)        3422 |                                                          |
> >
> > Again, I'm not sure if this is really a problem or not, or if it is even
> > a relevant metric for the overall performance, I was just curious to
> > understand why it is different.
> 
> This is an interesting find, the number of timer interrupt executions
> looks roughly the same in this histogram so it might not be missed
> cancellations or such, so it is not clear to me. But it is worth
> debugging and we'll try to reproduce your results.
> 
> Some more theories from our internal RCU discussion:
> - Could it be another user of RCU (call_rcu) from an unrelated hrtimer
> interrupt callback that is causing a "flush" of lazy callbacks?
> - What does the distribution look like for
> do_nocb_deferred_wakeup_timer ? That will have to probably be made
> non-static to be picked up by bpftrace (If you could try that real
> quick, appreciate!).

Sure, I'll repeat the test tracing do_nocb_deferred_wakeup_timer.

> 
> Slightly related, but one of the things we are wondering also is how
> much of the overhead for your nohz-full and lazy-RCU test (on top of
> baseline - that is just CONFIG_HZ=1000 without nohz-full or nocbs) is
> because of just using NOCB. Uladsizlau mentioned he might run a test
> for comparing along those lines as well.

Just to clarify, "lazy rcu on" results are just with rcu_nocb=all and
lazy RCUs enabled (and HZ=1000), so without nohz_full.

If I enable only nohz_full=all (without rcu_nocb) I see something like
this:

[1K, 2K)         	294 |                                                	 |
[2K, 4K)       	      59568 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)         	368 |                                                	 |

That is like baseline result / 8 invocations, because I have 8 cores and
only the timekeeping CPU is ticking, so that seems to make sense.

-Andrea