On Tue, Mar 01, 2022 at 09:43:05PM +0800, Zhen Lei wrote: > From: Frederic Weisbecker <frederic@xxxxxxxxxx> > > commit b2fcf2102049f6e56981e0ab3d9b633b8e2741da upstream. > > This sequence of events can lead to a failure to requeue a CPU's > ->nocb_timer: > > 1. There are no callbacks queued for any CPU covered by CPU 0-2's > ->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated > with CPU 0. > > 2. CPU 1 enqueues its first callback with interrupts disabled, and > thus must defer awakening its ->nocb_gp_kthread. It therefore > queues its rcu_data structure's ->nocb_timer. At this point, > CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE. > > 3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a > callback, but with interrupts enabled, allowing it to directly > awaken the ->nocb_gp_kthread. > > 4. The newly awakened ->nocb_gp_kthread associates both CPU 1's > and CPU 2's callbacks with a future grace period and arranges > for that grace period to be started. > > 5. This ->nocb_gp_kthread goes to sleep waiting for the end of this > future grace period. > > 6. This grace period elapses before the CPU 1's timer fires. > This is normally improbably given that the timer is set for only > one jiffy, but timers can be delayed. Besides, it is possible > that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y. > > 7. The grace period ends, so rcu_gp_kthread awakens the > ->nocb_gp_kthread, which in turn awakens both CPU 1's and > CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps > waiting for more newly queued callbacks. > > 8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps > waiting for more invocable callbacks. > > 9. Note that neither kthread updated any ->nocb_timer state, > so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE. > > 10. CPU 1 enqueues its second callback, this time with interrupts > enabled so it can wake directly ->nocb_gp_kthread. > It does so with calling wake_nocb_gp() which also cancels the > pending timer that got queued in step 2. But that doesn't reset > CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE. > So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now > desynchronized. > > 11. ->nocb_gp_kthread associates the callback queued in 10 with a new > grace period, arranges for that grace period to start and sleeps > waiting for it to complete. > > 12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread, > which in turn wakes up CPU 1's ->nocb_cb_kthread which then > invokes the callback queued in 10. > > 13. CPU 1 enqueues its third callback, this time with interrupts > disabled so it must queue a timer for a deferred wakeup. However > the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which > incorrectly indicates that a timer is already queued. Instead, > CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails > to queue the ->nocb_timer. > > 14. CPU 1 has its pending callback and it may go unnoticed until > some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever > calls an explicit deferred wakeup, for example, during idle entry. > > This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime > we delete the ->nocb_timer. > > It is quite possible that there is a similar scenario involving > ->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some > effort from several people, a failure scenario has not yet been located. > However, that by no means guarantees that no such scenario exists. > Finding a failure scenario is left as an exercise for the reader, and the > "Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer. > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing) > Cc: <stable@xxxxxxxxxxxxxxx> > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > Cc: Boqun Feng <boqun.feng@xxxxxxxxx> > Reviewed-by: Neeraj Upadhyay <neeraju@xxxxxxxxxxxxxx> > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > Conflicts: > kernel/rcu/tree_plugin.h > > Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx> > --- > kernel/rcu/tree_plugin.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) Now queued up, thanks. greg k-h