Hello Mathias- On Fri, Oct 07, 2016 at 08:58:08AM +0000, Koehrer Mathias (ETAS/ESW5) wrote: [..] > I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out > the access to the EICS registers: > > --- igb_main.c.orig 2016-10-07 10:43:37.855873754 +0200 > +++ igb_main.c 2016-10-07 10:31:51.451346292 +0200 > @@ -4613,6 +4613,7 @@ no_wait: > } > > /* Cause software interrupt to ensure Rx ring is cleaned */ > +#if 0 > if (adapter->flags & IGB_FLAG_HAS_MSIX) { > u32 eics = 0; > > @@ -4622,6 +4623,7 @@ no_wait: > } else { > wr32(E1000_ICS, E1000_ICS_RXDMT0); > } > +#endif > > igb_spoof_check(adapter); > igb_ptp_rx_hang(adapter); > > > The result is now slighty better, however the max latency that has been measured by > cyclictest is still much higher compared to the values of kernel 3.18.27-rt27. > I have now enabled all events, the execution of > # cyclictest -a -i 105 -m -n -p 80 -t 1 -b 30 -C > delivers the following trace values There is something still fishy with these traces... > cyclicte-10062 0....... 10025813402us : sys_exit: NR 230 = 0 > cyclicte-10062 0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e) > cyclicte-10062 0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS > cyclicte-10062 0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388 cyclictest thread sleeps waiting for wakeup at 10024735653388. > cyclicte-10062 0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef > cyclicte-10062 0d...1.. 10025813403us : rcu_utilization: Start context switch > cyclicte-10062 0d...1.. 10025813403us : rcu_utilization: End context switch > cyclicte-10062 0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120 > cyclicte-10062 0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0 > kworker/-1388 0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0 > kworker/-1388 0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled > kworker/-1388 0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000 > kworker/-1388 0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000 > kworker/-1388 0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0 > kworker/-1388 0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled > kworker/-1388 0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000 > kworker/-1388 0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000 So, kworker/0:3 was busy doing something, and we received both tx and rx interrupts from eth2, waking up the relevant threads. > kworker/-1388 0dN..1.. 10025813472us : rcu_utilization: Start context switch > kworker/-1388 0dN..1.. 10025813472us : rcu_utilization: End context switch > kworker/-1388 0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns] > kworker/-1388 0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49 > irq/48-e-10057 0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX] > irq/48-e-10057 0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX] > irq/48-e-10057 0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64 Hmm, the irq/48-eth2-tx- thread is raising NET_RX? That seems...wrong. > irq/48-e-10057 0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX] > irq/48-e-10057 0d...1.. 10025813475us : rcu_utilization: Start context switch > irq/48-e-10057 0d...1.. 10025813476us : rcu_utilization: End context switch > irq/48-e-10057 0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49 > irq/47-e-10056 0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX] > irq/47-e-10056 0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX] > irq/47-e-10056 0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64 > irq/47-e-10056 0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX] Oh, then we do the same thing again? Hrmph...why? > irq/47-e-10056 0d...1.. 10025813479us : rcu_utilization: Start context switch > irq/47-e-10056 0d...1.. 10025813479us : rcu_utilization: End context switch > irq/47-e-10056 0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120 Done handling the two interrupts. Back to whatever kworker/0:3 was up to... > kworker/-1388 0d..h... 10025813516us : local_timer_entry: vector=239 > kworker/-1388 0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19] > kworker/-1388 0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18 > kworker/-1388 0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387 Okay, we finally received our wakeup event. We were expecting to be woken up at 10024735653388ns, but were actually woken up at 10024735682387ns. 10024735682387 - 10024735653388 = 28999ns Our timer fired ~29us late! But why...? Sorry I don't have answers, just more questions. I do wonder what kworker/0:3 was up to at the time the timer interrupt should have fired. Julia -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html