Steve Fink wrote: > I have a quad-core Xeon with 4 gigabit NICs. I'm sending gobs of data > out three of them (about 93% of capacity). One on machine, identical > to the other, I have a problem where CPU0 is 0% idle while the other 3 > cores are over 60% idle. My application mostly avoids CPU0, and there > is very little user time being spent on CPU0. It is almost all irq > (about 25%) and sys (about 50%). > > On the other machine, there's a little bit more sys and irq time on > CPU0, but everything's pretty well balanced. > > If I kill irqbalance and manually set smp_affinity to avoid CPU0, > nothing much changes. /proc/interrupts shows a fair number of > interrupts going to CPU0 before I make the change, and none after > (other than "LOC" interrupts), but the CPU0 load doesn't change much > at all. > > There are hardly any packets coming into those three NICs; almost > everything is a set of (several hundred) MPEG streams going out. > > I guess my main question is: if /proc/interrupts doesn't show any > interrupts going to CPU0, then why is it spending any time on > interrupt handling? > > A related question: does NAPI apply to sending packets as well as > receiving them? How can I tell whether (1) my drivers support NAPI and > (2) it is active? > > I've tried playing with systemtap a bit to try to figure out what that > CPU0 time is going to, but I don't really know how to use it well > enough. > > Here's a snapshot of 'atop' output showing things when they aren't > quite as extreme (CPU0 is 56% sys, 14% irq, 11% user). This is with > all the interrupts shunted away from CPU0, or at least, all the ones > that would let me. > > PRC | sys 9.73s | user 9.55s | #proc 839 | #zombie 0 | #exit 182 | > CPU | sys 78% | user 96% | irq 46% | idle 170% | wait 10% | > cpu | sys 56% | user 11% | irq 14% | idle 19% | cpu000 w 0% | > cpu | sys 8% | user 32% | irq 7% | idle 51% | cpu001 w 2% | > cpu | sys 7% | user 27% | irq 12% | idle 49% | cpu003 w 5% | > cpu | sys 7% | user 26% | irq 13% | idle 51% | cpu002 w 3% | > CPL | avg1 20.95 | avg5 20.11 | avg15 20.09 | csw 233459 | intr 204338 | > MEM | tot 3.5G | free 214.6M | cache 1.3G | buff 41.5M | slab 38.2M | > SWP | tot 2.0G | free 2.0G | | vmcom 3.2G | vmlim 3.7G | > DSK | sda | busy 22% | read 0 | write 138 | avio 15 ms | > DSK | sdb | busy 21% | read 0 | write 138 | avio 15 ms | > NET | transport | tcpi 45226 | tcpo 34604 | udpi 608 | udpo 273 | > NET | network | ipi 45834 | ipo 2667927 | ipfrw 0 | deliv 45834 | > NET | eth2 96% | pcki 2 | pcko 883468 | si 0 Kbps | so 962 Mbps | > NET | eth3 95% | pcki 1 | pcko 876330 | si 0 Kbps | so 954 Mbps | > NET | eth1 95% | pcki 1 | pcko 872871 | si 0 Kbps | so 951 Mbps | > NET | lo ---- | pcki 22099 | pcko 22099 | si 1532 Kbps | so 1532 Kbps | > NET | eth0 0% | pcki 23426 | pcko 12763 | si 2948 Kbps | so 1557 Kbps | > > # uname -r > 2.6.18-8.el5.tvh.3 > (preemption is enabled) > > # rpm -q centos-release > centos-release-5-0.0.el5.centos.2 > > # egrep 'proc|model name' /proc/cpuinfo > processor : 0 > model name : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz > processor : 1 > model name : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz > processor : 2 > model name : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz > processor : 3 > model name : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz > > # lspci | fgrep Eth > 03:00.0 Ethernet controller: Broadcom Corporation Unknown device 165a > 04:00.0 Ethernet controller: Broadcom Corporation Unknown device 165a > 09:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit > Ethernet Controller (rev 06) > 09:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit > Ethernet Controller (rev 06) > > # ethtool -i eth1 > driver: tg3 > version: 3.65-rh > firmware-version: 5722-v3.07 > bus-info: 0000:04:00.0 > # ethtool -i eth2 > driver: e1000 > version: 7.2.7-k2-NAPI > firmware-version: 5.11-2 > bus-info: 0000:09:00.0 > # ethtool -i eth3 > driver: e1000 > version: 7.2.7-k2-NAPI > firmware-version: 5.11-2 > bus-info: 0000:09:00.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > My 1 cent here ... I would read this well written documentation, http://irqbalance.org/documentation.php it sheds light on many different questions. In addition, stop your irqbalance daemon, and execute the following in the foreground: irqbalance --debug I would also play around with irqbalance's environment variables, IRQBALANCE_BANNED_CPUS IRQBALANCE_ONESHOT IRQBALANCE_BANNED_INTERRUPTS to achieve your goals. NAPI is also called "RX polling", because it uses a mixture of polling and interrupts to process incoming network frames. Also, for the e1000 driver, I would look at your kernel's documentation: /usr/src/linux/Documentation/networking/e1000.txt You will notice the e1000 driver has NAPI (RX polling) enabled by default, as shown by your ethtool commands above. As far as the tg3 driver, going through the source code for tg3, /usr/src/linux/drivers/net/tg3.c, you will notice the RX polling related code for NAPI. I hope this helps some .. :) -- To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html