On Tue, Oct 04, 2022 at 06:57:59PM -0400, Joel Fernandes wrote: > >> needed after an entrain. Otherwise, the RCU barrier callback can wait in > >> the queue for several seconds before the lazy callbacks in front of it > >> are serviced. > > > > It's not about lazy callbacks here (but you can mention the fact that > > waking nocb_gp if necessary after flushing bypass is a beneficial side > > effect for further lazy implementation). > > > > So here is the possible bad scenario: > > > > 1) CPU 0 is nocb, it queues a callback > > 2) CPU 0 goes idle (or userspace with nohz_full) forever > > 3) The grace period related to that callback elapses > > 4) The callback is moved to the done list (but is not invoked yet), there are no more pending for CPU 0 > > 5) CPU 1 calls rcu_barrier() and entrains to CPU 0 cblist > > CPU 1 can only entrain into CPU 0 if the CPU is offline: > > if (!rcu_rdp_cpu_online(rdp)) { > rcu_barrier_entrain(rdp); > WARN_ON_ONCE(READ_ONCE(rdp->barrier_seq_snap) != gseq); > raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, > ... > continue; > } Ah good point. So CPU 1 sends an IPI to CPU 0 which entrains itself. And then looks like the result is the same. > > Otherwise an IPI does the entraining. So I do not see how CPU 0 being idle > causes the cross-CPU entraining. It doesn't but it shows that the CPU isn't going to enqueue any further callback before a while. Though even if it did, it may not even solve the situation, not until an RCU_NOCB_WAKE_FORCE is issued... > > > 6) CPU 1 waits forever > > But, I agree it can still wait forever, once the IPI handler does the > entraining, since nothing will do the GP thread wakeup. > > >> > >> Reported-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > > > > Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") > > So, do you mind writing a proper patch with a proper commit message and Fixes > tag then? It can independent of this series and add my Reported-by tag, > thanks! Ok will do. Thanks! > > Thanks! > > - Joel >