On Fri, Jun 12, 2020 at 10:49:53AM -0700, Paul E. McKenney wrote: > On Fri, Jun 12, 2020 at 03:55:00PM +0200, Thomas Gleixner wrote: > > The idea of conditionally calling into rcu_irq_enter() only when RCU is > > not watching turned out to be not completely thought through. > > > > Paul noticed occasional premature end of grace periods in RCU torture > > testing. Bisection led to the commit which made the invocation of > > rcu_irq_enter() conditional on !rcu_is_watching(). > > > > It turned out that this conditional breaks RCU assumptions about the idle > > task when the scheduler tick happens to be a nested interrupt. Nested > > interrupts can happen when the first interrupt invokes softirq processing > > on return which enables interrupts. If that nested tick interrupt does not > > invoke rcu_irq_enter() then the nest accounting in RCU claims that this is > > the first interrupt which might mark a quiescient state and end grace > > periods prematurely. > > For this last sentence, how about the following? > > If that nested tick interrupt does not invoke rcu_irq_enter() then the > RCU's irq-nesting checks will believe that this interrupt came directly > from idle, which will cause RCU to report a quiescent state. Because > this interrupt instead came from a softirq handler which might have > been executing an RCU read-side critical section, this can cause the > grace period to end prematurely. > > > Change the condition from !rcu_is_watching() to is_idle_task(current) which > > enforces that interrupts in the idle task unconditionally invoke > > rcu_irq_enter() independent of the RCU state. > > > > This is also correct vs. user mode entries in NOHZ full scenarios because > > user mode entries bring RCU out of EQS and force the RCU irq nesting state > > accounting to nested. As only the first interrupt can enter from user mode > > a nested tick interrupt will enter from kernel mode and as the nesting > > state accounting is forced to nesting it will not do anything stupid even > > if rcu_irq_enter() has not been invoked. > > On the testing front, just like with my busted patch yesterday, this > patch breaks the TASKS03 rcutorture scenario by preventing the Tasks > RCU grace periods from ever completing. However, this is an unusual > configuration with NO_HZ_FULL and one CPU actually being nohz_full. > The more conventional TASKS01 and TASKS02 scenarios do just fine. > > I will therefore address this issue in a follow-on patch. I should add that -your- patch from yesterday did -not- cause this problem, in case that is of interest. Thanx, Paul > > Fixes: 3eeec3858488 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()") > > Reported-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Reviewed-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > Tested-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > > --- > > arch/x86/entry/common.c | 35 ++++++++++++++++++++++++++++------- > > 1 file changed, 28 insertions(+), 7 deletions(-) > > --- a/arch/x86/entry/common.c > > +++ b/arch/x86/entry/common.c > > @@ -557,14 +557,34 @@ bool noinstr idtentry_enter_cond_rcu(str > > return false; > > } > > > > - if (!__rcu_is_watching()) { > > + /* > > + * If this entry hit the idle task invoke rcu_irq_enter() whether > > + * RCU is watching or not. > > + * > > + * Interupts can nest when the first interrupt invokes softirq > > + * processing on return which enables interrupts. > > + * > > + * Scheduler ticks in the idle task can mark quiescent state and > > + * terminate a grace period, if and only if the timer interrupt is > > + * not nested into another interrupt. > > + * > > + * Checking for __rcu_is_watching() here would prevent the nesting > > + * interrupt to invoke rcu_irq_enter(). If that nested interrupt is > > + * the tick then rcu_flavor_sched_clock_irq() would wrongfully > > + * assume that it is the first interupt and eventually claim > > + * quiescient state and end grace periods prematurely. > > + * > > + * Unconditionally invoke rcu_irq_enter() so RCU state stays > > + * consistent. > > + * > > + * TINY_RCU does not support EQS, so let the compiler eliminate > > + * this part when enabled. > > + */ > > + if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { > > /* > > * If RCU is not watching then the same careful > > * sequence vs. lockdep and tracing is required > > * as in enter_from_user_mode(). > > - * > > - * This only happens for IRQs that hit the idle > > - * loop, i.e. if idle is not using MWAIT. > > */ > > lockdep_hardirqs_off(CALLER_ADDR0); > > rcu_irq_enter(); > > @@ -576,9 +596,10 @@ bool noinstr idtentry_enter_cond_rcu(str > > } > > > > /* > > - * If RCU is watching then RCU only wants to check > > - * whether it needs to restart the tick in NOHZ > > - * mode. > > + * If RCU is watching then RCU only wants to check whether it needs > > + * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick() > > + * already contains a warning when RCU is not watching, so no point > > + * in having another one here. > > */ > > instrumentation_begin(); > > rcu_irq_enter_check_tick();