Hi Julia, > > In the meanwhile I have detected another finding which might be relevant: > > > > With the 3.18 kernel the igb driver comes with two interrupts per NIC > > (e.g. eth2 and eth2-TxRx0) with the 4.6. kernel the igb driver comes with 9 (!) > interrupts per NIC: > > eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7. > > > > As I have used initially the same kernel configuration from 3.18 also > > for the 4.6. kernel I wonder where this comes from and if there is any > > kernel option I may use to disable these many interrupts and to reduce > > it to 2 again. > > If it's all of these interrupts that are firing and being handled at the same time, that > can account for the latencies you were seeing. As I suggested before, having a > trace with the sched_wakeup event enabled can help confirm that it's these > interrupts causing problems. > > If it is true, then the question is: why is the device triggering all of these interrupts all > at once? Is that expected? These are questions for netdev folks, I think. > > Julia OK - I ran again the cyclictest. This time I used the -C option: # cyclictest -a -i 100 -m -n -p 80 -t 1 -b 21 -C And the last output lines of the trace are: cyclicte-5887 0d...2.. 1504647266us!: sched_switch: prev_comm=cyclictest prev_pid=5887 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120 kworker/-5 0dN.h2.. 1504647372us : sched_wakeup: comm=cyclictest pid=5887 prio=19 target_cpu=000 kworker/-5 0dN.h3.. 1504647374us : sched_wakeup: comm=irq/54-eth2-TxR pid=5883 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647375us : sched_wakeup: comm=irq/53-eth2-TxR pid=5882 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647377us : sched_wakeup: comm=irq/52-eth2-TxR pid=5881 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647378us : sched_wakeup: comm=irq/51-eth2-TxR pid=5880 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647380us : sched_wakeup: comm=irq/50-eth2-TxR pid=5879 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647381us : sched_wakeup: comm=irq/49-eth2-TxR pid=5878 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647382us : sched_wakeup: comm=irq/48-eth2-TxR pid=5877 prio=49 target_cpu=000 kworker/-5 0dN.h3.. 1504647383us : sched_wakeup: comm=irq/47-eth2-TxR pid=5876 prio=49 target_cpu=000 kworker/-5 0d...2.. 1504647384us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=5887 next_prio=19 cyclicte-5887 0.....11 1504647389us : tracing_mark_write: hit latency threshold (28 > 21) The attached trace-extract.gz shows some more lines. It actually looks to me as if the the many irq threads from igb are causing the issue. Regards Mathias
Attachment:
trace-extract.gz
Description: trace-extract.gz