On Mon, Aug 26, 2019 at 9:34 PM Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> wrote: > > The dynticks_nmi_nesting counter serves 4 purposes: > And actually, I meant 3 purposes ;-) :-P thanks, - Joel > (a) rcu_is_cpu_rrupt_from_idle() needs to be able to detect first > interrupt nesting level. > > (b) We need to detect half-interrupts till we are sure they're not an > issue. However, change the comparison to DYNTICK_IRQ_NONIDLE with 0. > > (c) When a quiescent state report is needed from a nohz_full CPU. > The nesting counter detects we are a first level interrupt. > > For (a) we can just use dyntick_nesting == 1 to determine this. Only the > outermost interrupt that interrupted an RCU-idle state can set it to 1. > > For (b), this warning condition has not occurred for several kernel > releases. But we still keep the warning but change it to use > in_interrupt() instead of the nesting counter. In a later year, we can > remove the warning. > > For (c), the nest check is not really necessary since forced_tick would > have been set to true in the outermost interrupt, so the nested/NMI > interrupts will check forced_tick anyway, and bail. > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > --- > .../Data-Structures/Data-Structures.rst | 31 +++------ > Documentation/RCU/stallwarn.txt | 6 +- > kernel/rcu/tree.c | 64 +++++++------------ > kernel/rcu/tree.h | 4 +- > kernel/rcu/tree_stall.h | 4 +- > 5 files changed, 41 insertions(+), 68 deletions(-) > > diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst > index 4a48e20a46f2..a5a907f434a1 100644 > --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst > +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst > @@ -936,10 +936,9 @@ This portion of the rcu_data structure is declared as follows: > :: > > 1 long dynticks_nesting; > - 2 long dynticks_nmi_nesting; > - 3 atomic_t dynticks; > - 4 bool rcu_need_heavy_qs; > - 5 bool rcu_urgent_qs; > + 2 atomic_t dynticks; > + 3 bool rcu_need_heavy_qs; > + 4 bool rcu_urgent_qs; > > These fields in the rcu_data structure maintain the per-CPU dyntick-idle > state for the corresponding CPU. The fields may be accessed only from > @@ -948,26 +947,14 @@ the corresponding CPU (and from tracing) unless otherwise stated. > The ``->dynticks_nesting`` field counts the nesting depth of process > execution, so that in normal circumstances this counter has value zero > or one. NMIs, irqs, and tracers are counted by the > -``->dynticks_nmi_nesting`` field. Because NMIs cannot be masked, changes > +``->dynticks_nesting`` field as well. Because NMIs cannot be masked, changes > to this variable have to be undertaken carefully using an algorithm > provided by Andy Lutomirski. The initial transition from idle adds one, > and nested transitions add two, so that a nesting level of five is > -represented by a ``->dynticks_nmi_nesting`` value of nine. This counter > +represented by a ``->dynticks_nesting`` value of nine. This counter > can therefore be thought of as counting the number of reasons why this > -CPU cannot be permitted to enter dyntick-idle mode, aside from > -process-level transitions. > - > -However, it turns out that when running in non-idle kernel context, the > -Linux kernel is fully capable of entering interrupt handlers that never > -exit and perhaps also vice versa. Therefore, whenever the > -``->dynticks_nesting`` field is incremented up from zero, the > -``->dynticks_nmi_nesting`` field is set to a large positive number, and > -whenever the ``->dynticks_nesting`` field is decremented down to zero, > -the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that > -the number of misnested interrupts is not sufficient to overflow the > -counter, this approach corrects the ``->dynticks_nmi_nesting`` field > -every time the corresponding CPU enters the idle loop from process > -context. > +CPU cannot be permitted to enter dyntick-idle mode. It counts both the > +process-level and interrupt transitions. > > The ``->dynticks`` field counts the corresponding CPU's transitions to > and from either dyntick-idle or user mode, so that this counter has an > @@ -1000,7 +987,9 @@ code. > +-----------------------------------------------------------------------+ > | Because this would fail in the presence of interrupts whose handlers | > | never return and of handlers that manage to return from a made-up | > -| interrupt. | > +| interrupt. NOTE: The counters have now been combined however | > +| a temporary warning has been left to make sure this condition never | > +| occurs. | > +-----------------------------------------------------------------------+ > > Additional fields are present for some special-purpose builds, and are > diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt > index f48f4621ccbc..585f73009a56 100644 > --- a/Documentation/RCU/stallwarn.txt > +++ b/Documentation/RCU/stallwarn.txt > @@ -173,8 +173,8 @@ For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling, > it will print a message similar to the following: > > INFO: rcu_sched detected stalls on CPUs/tasks: > - 2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0 > - 16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0 > + 2-...: (3 GPs behind) idle=06c/0 softirq=1453/1455 fqs=0 > + 16-...: (0 ticks this GP) idle=81c/0 softirq=764/764 fqs=0 > (detected by 32, t=2603 jiffies, g=7075, q=625) > > This message indicates that CPU 32 detected that CPUs 2 and 16 were both > @@ -225,7 +225,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs > In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed > for each CPU: > > - 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D > + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff softirq=82/543 last_accelerate: a345/d342 Nonlazy posted: ..D > > The "last_accelerate:" prints the low-order 16 bits (in hex) of the > jiffies counter when this CPU last invoked rcu_try_advance_all_cbs() > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 255cd6835526..1465a3e406f8 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -81,7 +81,6 @@ > > static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = { > .dynticks_nesting = 1, > - .dynticks_nmi_nesting = 0, > .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR), > }; > struct rcu_state rcu_state = { > @@ -392,15 +391,9 @@ static int rcu_is_cpu_rrupt_from_idle(void) > /* Check for counter underflows */ > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0, > "RCU dynticks_nesting counter underflow!"); > - RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0, > - "RCU dynticks_nmi_nesting counter underflow/zero!"); > > - /* Are we at first interrupt nesting level? */ > - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1) > - return false; > - > - /* Does CPU appear to be idle from an RCU standpoint? */ > - return __this_cpu_read(rcu_data.dynticks_nesting) == 0; > + /* Are we the outermost interrupt that arrived when RCU was idle? */ > + return __this_cpu_read(rcu_data.dynticks_nesting) == 1; > } > > #define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch ... */ > @@ -564,11 +557,10 @@ static void rcu_eqs_enter(bool user) > struct rcu_data *rdp = this_cpu_ptr(&rcu_data); > > /* Entering usermode/idle from interrupt is not handled. These would > - * mean usermode upcalls or idle entry happened from interrupts. But, > - * reset the counter if we warn. > + * mean usermode upcalls or idle exit happened from interrupts. Remove > + * the warning by 2020. > */ > - if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0)) > - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); > + WARN_ON_ONCE(in_interrupt()); > > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > rdp->dynticks_nesting == 0); > @@ -627,9 +619,8 @@ void rcu_user_enter(void) > > /* > * If we are returning from the outermost NMI handler that interrupted an > - * RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting > - * to let the RCU grace-period handling know that the CPU is back to > - * being RCU-idle. > + * RCU-idle period, update rdp->dynticks to let the RCU grace-period handling > + * know that the CPU is back to being RCU-idle. > * > * If you add or remove a call to rcu_nmi_exit_common(), be sure to test > * with CONFIG_RCU_EQS_DEBUG=y. > @@ -639,16 +630,13 @@ static __always_inline void rcu_nmi_exit_common(bool irq) > struct rcu_data *rdp = this_cpu_ptr(&rcu_data); > > /* > - * Check for ->dynticks_nmi_nesting underflow and bad ->dynticks. > + * Check for ->dynticks_nesting underflow and bad ->dynticks. > * (We are exiting an NMI handler, so RCU better be paying attention > * to us!) > */ > WARN_ON_ONCE(rdp->dynticks_nesting <= 0); > - WARN_ON_ONCE(rdp->dynticks_nmi_nesting <= 0); > WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs()); > > - WRITE_ONCE(rdp->dynticks_nmi_nesting, /* No store tearing. */ > - rdp->dynticks_nmi_nesting - 1); > /* > * If the nesting level is not 1, the CPU wasn't RCU-idle, so > * leave it in non-RCU-idle state. > @@ -750,11 +738,10 @@ static void rcu_eqs_exit(bool user) > WRITE_ONCE(rdp->dynticks_nesting, 1); > > /* Exiting usermode/idle from interrupt is not handled. These would > - * mean usermode upcalls or idle exit happened from interrupts. But, > - * reset the counter if we warn. > + * mean usermode upcalls or idle exit happened from interrupts. Remove > + * the warning by 2020. > */ > - if (WARN_ON_ONCE(rdp->dynticks_nmi_nesting != 0)) > - WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); > + WARN_ON_ONCE(in_interrupt()); > } > > /** > @@ -795,14 +782,13 @@ void rcu_user_exit(void) > * rcu_nmi_enter_common - inform RCU of entry to NMI context > * @irq: Is this call from rcu_irq_enter? > * > - * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and > - * rdp->dynticks_nmi_nesting to let the RCU grace-period handling know > - * that the CPU is active. This implementation permits nested NMIs, as > - * long as the nesting level does not overflow an int. (You will probably > - * run out of stack space first.) > + * If the CPU was idle from RCU's viewpoint, update rdp->dynticks to let the > + * RCU grace-period handling know that the CPU is active. This implementation > + * permits nested NMIs, as long as the nesting level does not overflow a long. > + * (You will probably run out of stack space first.) > * > - * If you add or remove a call to rcu_nmi_enter_common(), be sure to test > - * with CONFIG_RCU_EQS_DEBUG=y. > + * If you add or remove a call to rcu_nmi_enter_common(), be sure to test with > + * CONFIG_RCU_EQS_DEBUG=y. > */ > static __always_inline void rcu_nmi_enter_common(bool irq) > { > @@ -811,15 +797,16 @@ static __always_inline void rcu_nmi_enter_common(bool irq) > > /* Complain about underflow. */ > WARN_ON_ONCE(rdp->dynticks_nesting < 0); > - WARN_ON_ONCE(rdp->dynticks_nmi_nesting < 0); > > /* > * If idle from RCU viewpoint, atomically increment ->dynticks > - * to mark non-idle and increment ->dynticks_nmi_nesting by one. > - * Otherwise, increment ->dynticks_nmi_nesting by two. This means > - * if ->dynticks_nmi_nesting is equal to one, we are guaranteed > + * to mark non-idle and increment ->dynticks_nesting by one. > + * Otherwise, increment ->dynticks_nesting by two. This means > + * if ->dynticks_nesting is equal to one, we are guaranteed > * to be in the outermost NMI handler that interrupted an RCU-idle > - * period (observation due to Andy Lutomirski). > + * period (observation due to Andy Lutomirski). An exception > + * is if the interrupt arrived in kernel mode; in this case we would > + * be the outermost interrupt but still increment by 2 which is Ok. > */ > if (rcu_dynticks_curr_cpu_in_eqs()) { > > @@ -832,8 +819,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq) > rcu_cleanup_after_idle(); > > incby = 1; > - } else if (tick_nohz_full_cpu(rdp->cpu) && > - !rdp->dynticks_nmi_nesting && rdp->rcu_urgent_qs && > + } else if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_urgent_qs && > !rdp->rcu_forced_tick) { > rdp->rcu_forced_tick = true; > tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU); > @@ -846,8 +832,6 @@ static __always_inline void rcu_nmi_enter_common(bool irq) > WRITE_ONCE(rdp->dynticks_nesting, /* Prevent store tearing. */ > rdp->dynticks_nesting + incby); > > - WRITE_ONCE(rdp->dynticks_nmi_nesting, /* Prevent store tearing. */ > - rdp->dynticks_nmi_nesting + 1); > barrier(); > } > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index 055c31781d3a..ad7d3e31c5cf 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -176,8 +176,8 @@ struct rcu_data { > > /* 3) dynticks interface. */ > int dynticks_snap; /* Per-GP tracking for dynticks. */ > - long dynticks_nesting; /* Track process nesting level. */ > - long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */ > + long dynticks_nesting; /* Track dyntick (non-IDLE) nesting > + * level for kernel entry and interrupt. */ > atomic_t dynticks; /* Even value for idle, else odd. */ > bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */ > bool rcu_urgent_qs; /* GP old need light quiescent state. */ > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index 841ab43f3e60..0676460107d0 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -313,7 +313,7 @@ static void print_cpu_stall_info(int cpu) > } > print_cpu_stall_fast_no_hz(fast_no_hz, cpu); > delta = rcu_seq_ctr(rdp->mynode->gp_seq - rdp->rcu_iw_gp_seq); > - pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld/%#lx softirq=%u/%u fqs=%ld %s\n", > + pr_err("\t%d-%c%c%c%c: (%lu %s) idle=%03x/%ld softirq=%u/%u fqs=%ld %s\n", > cpu, > "O."[!!cpu_online(cpu)], > "o."[!!(rdp->grpmask & rdp->mynode->qsmaskinit)], > @@ -323,7 +323,7 @@ static void print_cpu_stall_info(int cpu) > "!."[!delta], > ticks_value, ticks_title, > rcu_dynticks_snap(rdp) & 0xfff, > - rdp->dynticks_nesting, rdp->dynticks_nmi_nesting, > + rdp->dynticks_nesting, > rdp->softirq_snap, kstat_softirqs_cpu(RCU_SOFTIRQ, cpu), > READ_ONCE(rcu_state.n_force_qs) - rcu_state.n_force_qs_gpstart, > fast_no_hz); > -- > 2.23.0.187.g17f5b7556c-goog >