Re: [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/17/21 6:58 AM, Paul E. McKenney wrote:
[Please note: This e-mail is from an EXTERNAL e-mail address]

On Sun, May 16, 2021 at 05:50:10PM +0800, yanfei.xu@xxxxxxxxxxxxx wrote:
From: Yanfei Xu <yanfei.xu@xxxxxxxxxxxxx>

rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
don't contain tasks which blocking the GP. However this rcu_node->lock
will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
non-zero. As a result the cpu will hung by this deadlock.

Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
Signed-off-by: Yanfei Xu <yanfei.xu@xxxxxxxxxxxxx>

Also a good catch, thank you!  Queued for further review and testing,
wordsmithed as shown below.  The rcutorture scripts have been known to
work on ARM in the past, and might still do so.  (I test on x86.)

As always, please check to make sure that I didn't mess something up.


Looks good to me, Thanks!

Regards,
Yanfei

                                                         Thanx, Paul

------------------------------------------------------------------------

commit e0a9b77f245ae4fe1537120fd5319bf9e091618e
Author: Yanfei Xu <yanfei.xu@xxxxxxxxxxxxx>
Date:   Sun May 16 17:50:10 2021 +0800

     rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock

     If rcu_print_task_stall() is invoked on an rcu_node structure that does
     not contain any tasks blocking the current grace period, it takes an
     early exit that fails to release that rcu_node structure's lock.  This
     results in a self-deadlock, which is detected by lockdep.

     To reproduce this bug:

     tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0"

     This will also result in other complaints, including RCU's scheduler
     hook complaining about blocking rather than preemption and an rcutorture
     writer stall.

     Only a partial RCU CPU stall warning message will be printed because of
     the self-deadlock.

     This commit therefore releases the lock on the rcu_print_task_stall()
     function's early exit path.

     Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
     Signed-off-by: Yanfei Xu <yanfei.xu@xxxxxxxxxxxxx>
     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index a10ea1f1f81f..d574e3bbd929 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
         struct task_struct *ts[8];

         lockdep_assert_irqs_disabled();
-       if (!rcu_preempt_blocked_readers_cgp(rnp))
+       if (!rcu_preempt_blocked_readers_cgp(rnp)) {
+               raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
                 return 0;
+       }
         pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
                rnp->level, rnp->grplo, rnp->grphi);
         t = list_entry(rnp->gp_tasks->prev,




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux