I've been trying to identify why we're seeing frequent stalls during packet transmission in our GPFS cluster in the bnx2 driver (as well as other NICs/drivers), but I am at the limit of my current knowledge. I used perf netdev events (as described in http://lwn.net/Articles/397654/) to measure the tx times, and see spikes such as the following:
dev len Qdisc netdevice free
em2 98 807740.878085sec 0.002msec 0.061msec
em2 98 807740.878119sec 0.002msec 0.029msec
em2 98 807741.140600sec 0.005msec 0.092msec
em2 65226 807742.763833sec 0.007msec 0.436msec
em2 66 807727.081712sec 0.001msec 16246.072msec
em2 66 807740.882741sec 0.001msec 3457.625msec
Based on the source for netdev-times.py, the "free" column is the difference between trace_net_dev_xmit() and trace_kfree_skb() in net/core/dev.c, but I'm not sure how to dig any deeper. Are there any common causes for this behavior? What's the best way to further break down the time difference between the xmit and kfree trace points?
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies