The patch 10f39bb1b2c1 (rcu: protect __rcu_read_unlock() against scheduler-using irq handlers) unveiled a kind of deadlock and resolved the deadlock problem by avoiding the condition when ->rcu_read_lock_nesting is zero && ->rcu_read_unlock_special is non-zero. To achieve it, the commit used negative values for ->rcu_read_lock_nesting. But now we have deferred_qs mechanism, we can defer qs rather than persevere in reporting qs and deadlock. All we need is setting special.b.deferred_qs before scheduler locks such as wake_up() and leave the qs deferred and return. After this change, rcu_read_unlock_special() is safe to be called in any context, including nested in __rcu_read_unlock() in interrupt. This change is important to change ->rcu_read_lock_nesting back to non-negative and further simplify the rcu_read_unlock(). Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> --- kernel/rcu/tree_plugin.h | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index e612c77dc446..dbded2b8c792 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -591,6 +591,7 @@ static void rcu_read_unlock_special(struct task_struct *t) irqs_were_disabled = irqs_disabled_flags(flags); if (preempt_bh_were_disabled || irqs_were_disabled) { bool exp; + bool deferred_qs = t->rcu_read_unlock_special.b.deferred_qs; struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; @@ -599,9 +600,18 @@ static void rcu_read_unlock_special(struct task_struct *t) (rdp->grpmask & rnp->expmask) || tick_nohz_full_cpu(rdp->cpu); // Need to defer quiescent state until everything is enabled. + // In some cases when in_interrupt() returns false, + // raise_softirq_irqoff() has to call wake_up(), + // and the !deferred_qs says that scheduler locks + // cannot be held, so the wakeup will be safe now. + // But this wake_up() may have RCU critical section nested + // in the scheduler locks and its rcu_read_unlock() would + // call rcu_read_unlock_special() and then wake_up() + // recursively and deadlock if deferred_qs is still false. + // To avoid it, deferred_qs has to be set beforehand. + t->rcu_read_unlock_special.b.deferred_qs = true; if (irqs_were_disabled && use_softirq && - (in_interrupt() || - (exp && !t->rcu_read_unlock_special.b.deferred_qs))) { + (in_interrupt() || (exp && !deferred_qs))) { // Using softirq, safe to awaken, and we get // no help from enabling irqs, unlike bh/preempt. raise_softirq_irqoff(RCU_SOFTIRQ); @@ -620,7 +630,6 @@ static void rcu_read_unlock_special(struct task_struct *t) irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu); } } - t->rcu_read_unlock_special.b.deferred_qs = true; local_irq_restore(flags); return; } -- 2.20.1