On Wed, Feb 07, 2024 at 11:52:35AM -0500, Joel Fernandes wrote: > On Wed, Feb 7, 2024 at 11:31 AM Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote: > > > > The actual number of callbacks should not be causing specifically the > > > hrtimer_interrupt() to take too long to run, AFAICS. But RCU's lazy feature does > > > increase the number of timer interrupts. > > > > > > Further still, it depends on how much hrtimer_interrupt() takes with lazy RCU to > > > call it a problem IMO. Some numbers with units will be nice. > > > > This is what I see (this is a single run, but the other runs are > > similar), unit is nanosec, with lazy RCU enabled hrtimer_interrupt() > > takes around 4K-16K ns, with lazy RCU off most of the times it takes > > 2K-4K ns: > > > > - lazy rcu off: > > > > [1K, 2K) 88307 |@@@@@@@@@@@@ | > > [2K, 4K) 380695 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > > [4K, 8K) 194 | | > > > > - lazy rcu on: > > > > [2K, 4K) 3094 | | > > [4K, 8K) 265763 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > > [8K, 16K) 182341 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > > [16K, 32K) 3422 | | > > > > Again, I'm not sure if this is really a problem or not, or if it is even > > a relevant metric for the overall performance, I was just curious to > > understand why it is different. > > This is an interesting find, the number of timer interrupt executions > looks roughly the same in this histogram so it might not be missed > cancellations or such, so it is not clear to me. But it is worth > debugging and we'll try to reproduce your results. > > Some more theories from our internal RCU discussion: > - Could it be another user of RCU (call_rcu) from an unrelated hrtimer > interrupt callback that is causing a "flush" of lazy callbacks? > - What does the distribution look like for > do_nocb_deferred_wakeup_timer ? That will have to probably be made > non-static to be picked up by bpftrace (If you could try that real > quick, appreciate!). Sure, I'll repeat the test tracing do_nocb_deferred_wakeup_timer. > > Slightly related, but one of the things we are wondering also is how > much of the overhead for your nohz-full and lazy-RCU test (on top of > baseline - that is just CONFIG_HZ=1000 without nohz-full or nocbs) is > because of just using NOCB. Uladsizlau mentioned he might run a test > for comparing along those lines as well. Just to clarify, "lazy rcu on" results are just with rcu_nocb=all and lazy RCUs enabled (and HZ=1000), so without nohz_full. If I enable only nohz_full=all (without rcu_nocb) I see something like this: [1K, 2K) 294 | | [2K, 4K) 59568 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4K, 8K) 368 | | That is like baseline result / 8 invocations, because I have 8 cores and only the timekeeping CPU is ticking, so that seems to make sense. -Andrea