Patch "rcu/nocb: Fix missed RCU barrier on deoffloading" has been added to the 6.12-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    rcu/nocb: Fix missed RCU barrier on deoffloading

to the 6.12-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     rcu-nocb-fix-missed-rcu-barrier-on-deoffloading.patch
and it can be found in the queue-6.12 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 87f2844cb644f7908c2ba2f2b384c189250db717
Author: Zqiang <qiang.zhang1211@xxxxxxxxx>
Date:   Tue Oct 22 11:41:17 2024 +0800

    rcu/nocb: Fix missed RCU barrier on deoffloading
    
    [ Upstream commit 2996980e20b7a54a1869df15b3445374b850b155 ]
    
    Currently, running rcutorture test with torture_type=rcu fwd_progress=8
    n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60
    test_boost=2, will trigger the following warning:
    
            WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0
            RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
             Call Trace:
              <TASK>
              ? __warn+0x7e/0x120
              ? rcu_nocb_rdp_deoffload+0x292/0x2a0
              ? report_bug+0x18e/0x1a0
              ? handle_bug+0x3d/0x70
              ? exc_invalid_op+0x18/0x70
              ? asm_exc_invalid_op+0x1a/0x20
              ? rcu_nocb_rdp_deoffload+0x292/0x2a0
              rcu_nocb_cpu_deoffload+0x70/0xa0
              rcu_nocb_toggle+0x136/0x1c0
              ? __pfx_rcu_nocb_toggle+0x10/0x10
              kthread+0xd1/0x100
              ? __pfx_kthread+0x10/0x10
              ret_from_fork+0x2f/0x50
              ? __pfx_kthread+0x10/0x10
              ret_from_fork_asm+0x1a/0x30
              </TASK>
    
    CPU0                               CPU2                          CPU3
    //rcu_nocb_toggle             //nocb_cb_wait                   //rcutorture
    
    // deoffload CPU1             // process CPU1's rdp
    rcu_barrier()
        rcu_segcblist_entrain()
            rcu_segcblist_add_len(1);
            // len == 2
            // enqueue barrier
            // callback to CPU1's
            // rdp->cblist
                                 rcu_do_batch()
                                     // invoke CPU1's rdp->cblist
                                     // callback
                                     rcu_barrier_callback()
                                                                 rcu_barrier()
                                                                   mutex_lock(&rcu_state.barrier_mutex);
                                                                   // still see len == 2
                                                                   // enqueue barrier callback
                                                                   // to CPU1's rdp->cblist
                                                                   rcu_segcblist_entrain()
                                                                       rcu_segcblist_add_len(1);
                                                                       // len == 3
                                     // decrement len
                                     rcu_segcblist_add_len(-2);
                                 kthread_parkme()
    
    // CPU1's rdp->cblist len == 1
    // Warn because there is
    // still a pending barrier
    // trigger warning
    WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
    cpus_read_unlock();
    
                                                                    // wait CPU1 to comes online and
                                                                    // invoke barrier callback on
                                                                    // CPU1 rdp's->cblist
                                                                    wait_for_completion(&rcu_state.barrier_completion);
    // deoffload CPU4
    cpus_read_lock()
      rcu_barrier()
        mutex_lock(&rcu_state.barrier_mutex);
        // block on barrier_mutex
        // wait rcu_barrier() on
        // CPU3 to unlock barrier_mutex
        // but CPU3 unlock barrier_mutex
        // need to wait CPU1 comes online
        // when CPU1 going online will block on cpus_write_lock
    
    The above scenario will not only trigger a WARN_ON_ONCE(), but also
    trigger a deadlock.
    
    Thanks to nocb locking, a second racing rcu_barrier() on an offline CPU
    will either observe the decremented callback counter down to 0 and spare
    the callback enqueue, or rcuo will observe the new callback and keep
    rdp->nocb_cb_sleep to false.
    
    Therefore check rdp->nocb_cb_sleep before parking to make sure no
    further rcu_barrier() is waiting on the rdp.
    
    Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine")
    Suggested-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
    Signed-off-by: Zqiang <qiang.zhang1211@xxxxxxxxx>
    Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@xxxxxxx>
    Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 16865475120ba..2605dd234a13c 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
 					    nocb_cb_wait_cond(rdp));
 	if (kthread_should_park()) {
-		kthread_parkme();
+		/*
+		 * kthread_park() must be preceded by an rcu_barrier().
+		 * But yet another rcu_barrier() might have sneaked in between
+		 * the barrier callback execution and the callbacks counter
+		 * decrement.
+		 */
+		if (rdp->nocb_cb_sleep) {
+			rcu_nocb_lock_irqsave(rdp, flags);
+			WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			kthread_parkme();
+		}
 	} else if (READ_ONCE(rdp->nocb_cb_sleep)) {
 		WARN_ON(signal_pending(current));
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux