On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote: > > > On 10/2/2022 10:06 AM, Pingfan Liu wrote: > > On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > >> > >> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@xxxxxxxxx> wrote: > >>> > >>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@xxxxxxxxx> wrote: > >>>> > >>> [...] > >>>> " > >>>> > >>>> I have no idea whether this is related to the reverted commit. > >>>> > >>> > >>> I have started another test against clean v6.0-rc7 to see whether this > >>> is an issue with the mainline. > >> > >> I am not sure what exactly you are reverting (you could clarify that), > > > > commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick"). > > But due to conflict, "git revert" can not work directly. So I applied > > it by hand. > > > >> but if you are just removing the entire TICK_DEP_BIT_RCU, I do > >> remember (and mentioned on IRC to others recently) that without this > >> NOHZ_FULL has a hard time ending grace-periods because the forcing of > >> tick is needed for this configuration if we are spinning in the kernel > >> with the tick turned off. That seems to align with your TREE04 > >> (NOHZ_FULL) configuration. > >> > > > > Yes, that is the scenario. > > > >> Also, the commit Frederic suggested to revert seems to be a cosmetic > >> optimization in the interrupt-entry path. That should not change > >> functionality I believe. So I did not fully follow why reverting that > >> is relevant (maybe Frederic can clarify?). > >> > > > > Leave this question to Frederic. > > I take this comment back, Sorry. Indeed the commits Frederic mentioned will make > a functional change to CPU hotplug path. > > Sorry for the noise. > > Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I > might jump into the investigation with you guys, but I have to make time for > Lazy-RCU v7 next :) One historical reason was that a nohz_full CPU could enter the kernel with the tick still disabled, and stay that way indefinitely. Among other things, this can interfere with the grace-period wait in the offlining code path. Thanx, Paul