On Mon, Jan 09, 2023 at 12:09:48AM +0100, Frederic Weisbecker wrote: > On Sat, Jan 07, 2023 at 09:55:22PM -0500, Joel Fernandes wrote: > > > > > > > On Jan 7, 2023, at 9:48 PM, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > >>> On Jan 7, 2023, at 5:11 PM, Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > >>> > > >>> On Fri, Jan 06, 2023 at 07:01:28PM -0500, Joel Fernandes wrote: > > >>> (lost html content) > > > > > > My problem is the iPhone wises up when I put a web link in an email. I want to look into smtp relays but then if I spent time on fixing that, I might not get time to learn from emails like these... > > > > > >> I can't find a place where the exp grace period sends an IPI to > > >> CPUs slow to report a QS. But anyway you really need the tick to poll > > >> periodically on the CPU to chase a quiescent state. > > > > > > Ok. > > > > > >> Now arguably it's probably only useful when CONFIG_PREEMPT_COUNT=y > > >> and rcu_exp_handler() has interrupted a preempt-disabled or bh-disabled > > >> section. Although rcu_exp_handler() sets TIF_RESCHED, which is handled > > >> by preempt_enable() and local_bh_enable() when CONFIG_PREEMPT=y. > > >> So probably it's only useful when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n > > >> (and there is also PREEMPT_DYNAMIC to consider). > > > > > > Makes sense. I think I was missing this use case and was going by the general design of exp grace periods. I was incorrectly assuming the IPIs were being sent repeatedly for hold out CPUs, which is not the case I think. But that would another way to fix it? > > > > > > But yeah I get your point, the first set of IPIs missed it, so we need the rescue-tick for long non-rcu_read_lock() implicit critical sections.. > > > > > >> If CONFIG_PREEMPT_COUNT=n, the tick can only report idle and user > > >> as QS, but those are already reported explicitly on ct_kernel_exit() -> > > >> rcu_preempt_deferred_qs(). > > > > > > Oh hmm, because that function is a NOOP for PREEMPT_COUNT=y and PREEMPT=n and will not report the deferred QS? Maybe it should then. However I think the tick is still useful if after the preempt disabled section, will still did not exit the kernel. > > > > I think meant I here, an atomic section (like bh or Irq disabled). There is no such thing as disabling preemption for CONFIG_PREEMPT=n. Or maybe I am confused again. This RCU thing… > > Right, so when CONFIG_PREEMPT_COUNT=n, there is no way for a tick to tell if the > the interrupted code is safely considered as a QS. That's because > preempt_disable() <-> preempt_enable() are no-ops so the whole kernel is > assumed non-preemptible, and therefore the whole kernel is a READ side critical > section, except for the explicit points reporting a QS. > > The only exception is when the tick interrupts idle (or user with > nohz_full). But we already have an exp QS reported on idle (and user with > nohz_full) entry through ct_kernel_exit(), and that happens on all RCU_TREE > configs (PREEMPT or not). Therefore the tick doesn't appear to be helpful at > all on a nohz_full CPU with CONFIG_PREEMPT_COUNT=n. > > I suggest we don't bother optimizing that case though... > > To summarize: > > 1) nohz_full && !CONFIG_PREEMPT_COUNT && !CONFIG_PREEMPT_RCU: > Tick isn't helpful. It can only report idle/user QS, but that is > already reported explicitly. > > 2) nohz_full && CONFIG_PREEMPT_COUNT && !CONFIG_PREEMPT_RCU: > Tick is very helpful because it can tell if the kernel is in > a QS state. > > 3) nohz_full && CONFIG_PREEMPT_RCU: > Tick doesn't appear to be helpful because: > - If the rcu_exp_handler() fires in an rcu_read_lock'ed section, then the > exp QS is reported on rcu_read_unlock() > - If the rcu_exp_handler() fires in a preempt/bh disabled section, > TIF_RESCHED is forced which is handled on preempt/bh re-enablement, > reporting a QS. > > > The case 2) is a niche, only useful for debugging. But anyway I'm not sure it's > worth changing/optimizing the current state. Might be worth add a comment > though. Thank you both for the analysis! I would welcome a comment. One could argue that we should increase the delay before turning the tick on, but my experience is that expedited grace periods almost always complete in less than a jiffy, so there would almost never be any benefit in doing so. But if some large NO_HZ_FULL system with long RCU readers starts having trouble with too-frequent tick enablement, that is one possible fix. Thanx, Paul