Hi Mitch, > > > > > > Although, to be clear, it isn't the fact that there exists 8 > > > threads, it's > > that the device is > > > firing all 8 interrupts at the same time. The time spent in hardirq > > context just waking > > > up all 8 of those threads (and the cyclictest wakeup) is enough to > > > cause > > your > > > regression. > > > > > > netdev/igb folks- > > > > > > Under what conditions should it be expected that the i350 trigger > > > all of > > the TxRx > > > interrupts simultaneously? Any ideas here? > > I can answer that! I wrote that code. > > We trigger the interrupts once a second because MSI and MSI-X interrupts are NOT > guaranteed to be delivered. If this happens, the queues being serviced by this "lost" > interrupt are completely stuck. > > The device automatically masks each interrupt vector after it fires, expecting the > ISR to re-enable the vector after processing is complete. If the interrupt is lost, the > ISR doesn't run, so the vector ends up permanently masked. At this point, any > queues associated with that vector are stuck. The only recovery is through the > netdev watchdog, which initiates a reset. > > During development of igb, we had several platforms with chipsets that routinely > dropped MSI messages under stress. Things would be running fine and then, pow, > all the traffic on a queue would stop. > > So, I added code to fire each vector once per second. Just unmasking the interrupt > isn't enough - we need to trigger the ISR to get the queues cleaned up so the device > can work again. > > Is this workaround still needed? I don't know. Modern chipsets don't break a sweat > handling gigabit-speed traffic, and they *probably* don't drop interrupts. But I'd still > rather have that insurance. > > You could try to remove the write to the EICS registers in the watchdog task to see if > that takes care of your problem. But I wouldn't want to remove that code > permanently, because we have seen lost interrupts in the past. > > You also could try staggering the writes so that not all vectors fire each second. But > then you'll potentially incur a much longer delay if an interrupt does get lost, which > means you could trigger netdev watchdog events. > > -Mitch Thanks for the explanation. Now I have switched over to kernel 4.8-rt1. This shows the very same effect. I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out the access to the EICS registers: --- igb_main.c.orig 2016-10-07 10:43:37.855873754 +0200 +++ igb_main.c 2016-10-07 10:31:51.451346292 +0200 @@ -4613,6 +4613,7 @@ no_wait: } /* Cause software interrupt to ensure Rx ring is cleaned */ +#if 0 if (adapter->flags & IGB_FLAG_HAS_MSIX) { u32 eics = 0; @@ -4622,6 +4623,7 @@ no_wait: } else { wr32(E1000_ICS, E1000_ICS_RXDMT0); } +#endif igb_spoof_check(adapter); igb_ptp_rx_hang(adapter); The result is now slighty better, however the max latency that has been measured by cyclictest is still much higher compared to the values of kernel 3.18.27-rt27. I have now enabled all events, the execution of # cyclictest -a -i 105 -m -n -p 80 -t 1 -b 30 -C delivers the following trace values cyclicte-10062 0....... 10025813402us : sys_exit: NR 230 = 0 cyclicte-10062 0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e) cyclicte-10062 0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS cyclicte-10062 0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388 cyclicte-10062 0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef cyclicte-10062 0d...1.. 10025813403us : rcu_utilization: Start context switch cyclicte-10062 0d...1.. 10025813403us : rcu_utilization: End context switch cyclicte-10062 0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120 cyclicte-10062 0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0 kworker/-1388 0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0 kworker/-1388 0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled kworker/-1388 0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000 kworker/-1388 0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000 kworker/-1388 0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0 kworker/-1388 0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled kworker/-1388 0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000 kworker/-1388 0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000 kworker/-1388 0dN..1.. 10025813472us : rcu_utilization: Start context switch kworker/-1388 0dN..1.. 10025813472us : rcu_utilization: End context switch kworker/-1388 0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns] kworker/-1388 0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49 irq/48-e-10057 0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX] irq/48-e-10057 0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX] irq/48-e-10057 0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64 irq/48-e-10057 0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX] irq/48-e-10057 0d...1.. 10025813475us : rcu_utilization: Start context switch irq/48-e-10057 0d...1.. 10025813476us : rcu_utilization: End context switch irq/48-e-10057 0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49 irq/47-e-10056 0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX] irq/47-e-10056 0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX] irq/47-e-10056 0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64 irq/47-e-10056 0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX] irq/47-e-10056 0d...1.. 10025813479us : rcu_utilization: Start context switch irq/47-e-10056 0d...1.. 10025813479us : rcu_utilization: End context switch irq/47-e-10056 0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120 kworker/-1388 0d..h... 10025813516us : local_timer_entry: vector=239 kworker/-1388 0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19] kworker/-1388 0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18 kworker/-1388 0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387 kworker/-1388 0d..h1.. 10025813517us : sched_waking: comm=cyclictest pid=10062 prio=19 target_cpu=000 kworker/-1388 0dN.h2.. 10025813517us : sched_wakeup: comm=cyclictest pid=10062 prio=19 target_cpu=000 kworker/-1388 0dN.h... 10025813517us : hrtimer_expire_exit: hrtimer=ffff88040a017e18 kworker/-1388 0dN.h1.. 10025813518us : hrtimer_interrupt: cpu=0 offset=317613 curr=kworker/0:3[120] thread=<none>[-1] kworker/-1388 0dN.h... 10025813518us : write_msr: 6e0, value 20ca631e99f8 kworker/-1388 0dN.h... 10025813518us : local_timer_exit: vector=239 kworker/-1388 0dN..1.. 10025813518us : rcu_utilization: Start context switch kworker/-1388 0dN..1.. 10025813518us : rcu_utilization: End context switch kworker/-1388 0dN..2.. 10025813518us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=38173 [ns] vruntime=101216326505 [ns] kworker/-1388 0d...2.. 10025813519us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=10062 next_prio=19 kworker/-1388 0d...2.. 10025813519us : x86_fpu_regs_activated: x86/fpu: ffff88040c603ec0 fpregs_active: 1 fpstate_active: 1 counter: 70 xfeatures: 2 xcomp_bv: 0 kworker/-1388 0d...2.. 10025813519us : write_msr: c0000100, value 7ffff7400700 cyclicte-10062 0....... 10025813519us : sys_exit: NR 230 = 0 cyclicte-10062 0....... 10025813524us : sys_enter: NR 1 (5, 7ffff7400300, 1f, 7ffff77a5460, 2, 7ffff744c99a) cyclicte-10062 0.....11 10025813526us : tracing_mark_write: hit latency threshold (32 > 30) cyclicte-10062 0....... 10025813526us : sys_exit: NR 1 = 31 cyclicte-10062 0....... 10025813527us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460, 2, 7ffff744c99a) Also, I notice that there is still roughly one irq every 2 seconds on irq/47 (eth2-rx-0) and irq/48 (eth2-tx-0). Any idea or feedback on this? Thanks Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html