Hi On 07/31/2018 05:52 PM, Frederic Weisbecker wrote: > Before updating the full nohz tick or the idle time on IRQ exit, we > check first if we are not in a nesting interrupt, whether the inner > interrupt is a hard or a soft IRQ. > > There is a historical reason for that: the dyntick idle mode used to > reprogram the tick on IRQ exit, after softirq processing, and there was > no point in doing that job in the outer nesting interrupt because the > tick update will be performed through the end of the inner interrupt > eventually, with even potential new timer updates. > > One corner case could show up though: if an idle tick interrupts a softirq > executing inline in the idle loop (through a call to local_bh_enable()) > after we entered in dynticks mode, the IRQ won't reprogram the tick > because it assumes the softirq executes on an inner IRQ-tail. As a > result we might put the CPU in sleep mode with the tick completely > stopped whereas a timer can still be enqueued. Indeed there is no tick > reprogramming in local_bh_enable(). We probably asssumed there was no bh > disabled section in idle, although there didn't seem to be debug code > ensuring that. > > Nowadays the nesting interrupt optimization still stands but only concern > full dynticks. The tick is stopped on IRQ exit in full dynticks mode > and we want to wait for the end of the inner IRQ to reprogramm the tick. > But in_interrupt() doesn't make a difference between softirqs executing > on IRQ tail and those executing inline. What was to be considered a > corner case in dynticks-idle mode now becomes a serious opportunity for > a bug in full dynticks mode: if a tick interrupts a task executing > softirq inline, the tick reprogramming will be ignored and we may exit > to userspace after local_bh_enable() with an enqueued timer that will > never fire. > > To fix this, simply keep reprogramming the tick if we are in a hardirq > interrupting softirq. We can still figure out a way later to restore > this optimization while excluding inline softirq processing. > > Reported-by: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx> > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Tested-by: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx> > --- > kernel/softirq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 900dcfe..0980a81 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -386,7 +386,7 @@ static inline void tick_irq_exit(void) > > /* Make sure that timer wheel updates are propagated */ > if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) { > - if (!in_interrupt()) > + if (!in_irq()) > tick_nohz_irq_exit(); > } > #endif > This patch was back ported to the Stable linux-4.14.y and It causes regression - flood of "NOHZ: local_softirq_pending" messages on all TI boards during boot (NFS boot): [ 4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256 [ 4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256 the same is not reproducible with LKML - seems due to changes in tick-sched.c __tick_nohz_idle_enter()/tick_nohz_irq_exit(). I've generated backtrace from can_stop_idle_tick() (see below) and seems this patch makes tick_nohz_irq_exit() call unconditional in case of nested interrupt: gic_handle_irq |- irq_exit |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1] |-__do_softirq <irqs enabled> |- gic_handle_irq() |- irq_exit() |- tick_irq_exit() if (!in_irq()) <-- My understanding is that this condition will be always true due to [1] tick_nohz_irq_exit(); |-__tick_nohz_idle_enter() |- can_stop_idle_tick() Sry, not sure if my conclusion is right and how can it be fixed. [ 3.842320] NOHZ: local_softirq_pending 40 in sirq 256 [ 3.847485] ------------[ cut here ]------------ [ 3.852133] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:915 __tick_nohz_idle_enter+0x4b8/0x568 [ 3.861393] Modules linked in: [ 3.864469] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.66-01768-gc26f664-dirty #311 [ 3.872506] Hardware name: Generic DRA74X (Flattened Device Tree) [ 3.878623] Backtrace: [ 3.881091] [<c010c050>] (dump_backtrace) from [<c010c320>] (show_stack+0x18/0x1c) [ 3.888696] r7:00000009 r6:600f0193 r5:00000000 r4:c0c5fca4 [ 3.894386] [<c010c308>] (show_stack) from [<c07ad028>] (dump_stack+0x8c/0xa0) [ 3.901645] [<c07acf9c>] (dump_stack) from [<c012e558>] (__warn+0xec/0x104) [ 3.908638] r7:00000009 r6:c0996d08 r5:00000000 r4:00000000 [ 3.914329] [<c012e46c>] (__warn) from [<c012e628>] (warn_slowpath_null+0x28/0x30) [ 3.921933] r9:00000000 r8:e4e1f7de r7:c0c8c1d8 r6:c0c65180 r5:00000000 r4:eed408e8 [ 3.929715] [<c012e600>] (warn_slowpath_null) from [<c01a9780>] (__tick_nohz_idle_enter+0x4b8/0x568) [ 3.938890] [<c01a92c8>] (__tick_nohz_idle_enter) from [<c01a9c74>] (tick_nohz_irq_exit+0x2c/0x30) [ 3.947890] r10:c0c01f50 r9:c0c00000 r8:ee008000 r7:00000000 r6:c0c01ee0 r5:00000048 [ 3.955752] r4:c0b62afc [ 3.958301] [<c01a9c48>] (tick_nohz_irq_exit) from [<c0133748>] (irq_exit+0xf4/0x144) [ 3.966173] [<c0133654>] (irq_exit) from [<c0181e6c>] (__handle_domain_irq+0x68/0xbc) [ 3.974043] [<c0181e04>] (__handle_domain_irq) from [<c0101508>] (gic_handle_irq+0x44/0x80) [ 3.982432] r9:c0c00000 r8:fa213000 r7:fa212000 r6:c0c01dd0 r5:fa21200c r4:c0c03ff0 [ 3.990211] [<c01014c4>] (gic_handle_irq) from [<c010cf4c>] (__irq_svc+0x6c/0xa8) [ 3.997726] Exception stack(0xc0c01dd0 to 0xc0c01e18) [ 4.002800] 1dc0: 00000000 c0c65180 00000000 00000000 [ 4.011017] 1de0: 00000280 00000013 c0c00000 00000000 ee008000 c0c00000 c0c01f50 c0c01e7c [ 4.019232] 1e00: c0c01e80 c0c01e20 c0133730 c01015dc 600f0113 ffffffff [ 4.025877] r9:c0c00000 r8:ee008000 r7:c0c01e04 r6:ffffffff r5:600f0113 r4:c01015dc [ 4.033659] [<c0101548>] (__do_softirq) from [<c0133730>] (irq_exit+0xdc/0x144) [ 4.041002] r10:c0c01f50 r9:c0c00000 r8:ee008000 r7:00000000 r6:00000000 r5:00000013 [ 4.048863] r4:c0b62afc [ 4.051414] [<c0133654>] (irq_exit) from [<c0181e6c>] (__handle_domain_irq+0x68/0xbc) [ 4.059280] [<c0181e04>] (__handle_domain_irq) from [<c0101508>] (gic_handle_irq+0x44/0x80) [ 4.067670] r9:c0c00000 r8:fa213000 r7:fa212000 r6:c0c01ee0 r5:fa21200c r4:c0c03ff0 [ 4.075448] [<c01014c4>] (gic_handle_irq) from [<c010cf4c>] (__irq_svc+0x6c/0xa8) [ 4.082963] Exception stack(0xc0c01ee0 to 0xc0c01f28) [ 4.088041] 1ee0: 00000001 00000000 fe600000 00000000 c0c00000 c0c03cc8 c0c03c68 c0b623b8 [ 4.096258] 1f00: 00000000 00000000 c0c01f50 c0c01f3c c0c01f1c c0c01f30 c0120f14 c0108ae4 [ 4.104469] 1f20: 600f0013 ffffffff [ 4.107975] r9:c0c00000 r8:00000000 r7:c0c01f14 r6:ffffffff r5:600f0013 r4:c0108ae4 [ 4.115760] [<c0108abc>] (arch_cpu_idle) from [<c07c6a14>] (default_idle_call+0x28/0x34) [ 4.123894] [<c07c69ec>] (default_idle_call) from [<c016e4a4>] (do_idle+0x180/0x214) [ 4.131676] [<c016e324>] (do_idle) from [<c016e7fc>] (cpu_startup_entry+0x20/0x24) [ 4.139283] r10:effff7c0 r9:c0b48a30 r8:c0c64000 r7:c0c03c40 r6:ffffffff r5:00000002 [ 4.147146] r4:000000be [ 4.149698] [<c016e7dc>] (cpu_startup_entry) from [<c07c12e4>] (rest_init+0xd8/0xdc) [ 4.157483] [<c07c120c>] (rest_init) from [<c0b00d8c>] (start_kernel+0x3cc/0x3d8) [ 4.164997] r5:c0c64000 r4:c0c6404c [ 4.168592] [<c0b009c0>] (start_kernel) from [<8000807c>] (0x8000807c) [ 4.175148] ---[ end trace 9c10a64bf81ad3fe ]--- -- regards, -grygorii