Hi Frederic, On Mon, Jan 23, 2023 at 10:11:15PM +0100, Frederic Weisbecker wrote: > On Mon, Jan 23, 2023 at 03:54:07PM -0500, Joel Fernandes wrote: > > > On Jan 23, 2023, at 11:27 AM, Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > > > > > On Mon, Jan 23, 2023 at 10:22:19AM -0500, Joel Fernandes wrote: > > >>> What am I missing? > > >> > > >> That the acceleration is also done by __note_gp_changes() once the > > >> grace period ends anyway, so if any acceleration was missed as you > > >> say, it will be done anyway. > > >> > > >> Also it is done by scheduler tick raising softirq: > > >> > > >> rcu_pending() does this: > > >> /* Has RCU gone idle with this CPU needing another grace period? */ > > >> if (!gp_in_progress && rcu_segcblist_is_enabled(&rdp->cblist) && > > >> !rcu_rdp_is_offloaded(rdp) && > > >> !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > > >> return 1; > > >> > > >> and rcu_core(): > > >> /* No grace period and unregistered callbacks? */ > > >> if (!rcu_gp_in_progress() && > > >> rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { > > >> rcu_nocb_lock_irqsave(rdp, flags); > > >> if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) > > >> rcu_accelerate_cbs_unlocked(rnp, rdp); > > >> rcu_nocb_unlock_irqrestore(rdp, flags); > > >> } > > >> > > >> So, I am not sure if you need needacc at all. Those CBs that have not > > >> been assigned grace period numbers will be taken care off :) > > > > > > But that's only when there is no grace period pending, so it can't happen while > > > we report a QS. > > > > > > OTOH without the needacc, those callbacks waiting to be accelerated would be > > > eventually processed but only on the next tick following the end of a grace > > > period...if none has started since then. So if someone else starts a new GP > > > before the current CPU, we must wait another GP, etc... > > > > > > That's potentially dangerous. > > > > Waiting for just one more GP cannot be dangerous IMO. Anyway there is no > > guarantee that callback will run immediately at end of GP, there may be one or > > more GPs before callback can run, if I remember correctly. That is by > > design.. but please correct me if my understanding is different from yours. > > It's not bound to just one GP. If you have N CPUs flooding callbacks for a > long while, your CPU has 1/N chances to be the one starting the next GP on > each turn. Relying on the acceleration to be performed only when no GP is > running may theoretically starve your callbacks forever. I tend to agree with you, however I was mostly referring to the "needac". The theoretical case is even more theoretical because you have to be half way through the transition _and_ flooding at the same time such that there is not a chance for RCU to be idle for a long period of time (which we don't really see in our systems per-se). But I agree, even if theoretical, maybe better to handle it. > > > And unfortunately we can't do the acceleration from __note_gp_changes() > > > due to lock ordering restrictions: nocb_lock -> rnp_lock > > > Yeah I was more going for the fact that if the offload to deoffload transition completed before note_gp_changes() was called, which obviously we cannot assume... But yeah for the 'limbo between offload to not offload state', eventually RCU goes idle, the scheduling clock interrupt (via rcu_pending()) will raise softirq or note_gp_changes() will accelerate. Which does not help the theoretical flooding case above, but still... > > > > Ah. This part I am not sure. Appreciate if point me to any old archive > > links or documentation detailing that, if possible… > > It's not documented but the code in nocb_gp_wait() or nocb_cb_wait() has > that locking order for example. > > Excerpt: > > rcu_nocb_lock_irqsave(rdp, flags); if (rcu_segcblist_nextgp(cblist, > &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) && > raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */ > needwake_gp = rcu_advance_cbs(rdp->mynode, rdp); > raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ } > > Thanks. Ah I know what you mean now, in other words there is a chance of an ABBA deadlock if we do not acquire the locks in the correct order. Thank you for clarifying. Indeed rcutree_migrate_callbacks() also confirms it. I guess to Frederic's point, another case other than the CB flooding, that can starve CB accelerations is if the system is in the 'limbo between offload and de-offload' state for a long period of time. In such a situation also, neither note_gp_changes(), nor the scheduling-clock interrupt via softirq will help accelerate CBs. Whether such a limbo state is possible indefinitely, I am not sure... Thanks a lot for the discussions and it is always a learning experience talking about these...! thanks, - Joel