On Fri, 2010-01-29 at 14:59 -0500, Kelvin Ku wrote: > On Fri, Jan 29, 2010 at 12:22:05PM +0200, Gilboa Davara wrote: > > > which > > > was throttling the CPUs to 1.6 GHz (from a maximum of 2.4 GHz). I attempted to > > > remedy this by setting InterruptThrottleRate=0,0 in the e1000e driver, after > > > which we had one full day of testing with zero rx_missed_errors, but the > > > application still reported packet loss. > > > > rx_missed_error usually get triggered when the kernel is slow to handle > > incoming hardware interrupts. > > There's a trade-off here, increase the interrupt rate and you'll > > increase the kernel CPU usage as the expense of lower latency - decrease > > the interrupt rate, and you'll reduce the CPU usage at the expense of a > > higher chance of hitting the RX queue limit. > > I'd suggest you try setting the InterruptThrottleRate to 1000, while > > increasing the RX queues to 4096. > > (sbin/ethtool -G DEVICE rx 4096) > > > > You could try enabling multi-queue by adding IntterruptType=2, > > RSS=NUM_OF_QUEUE and MQ=1 to your modprobe.conf.d. > > I'll try these suggestions later today. Note that I was able to disable > interrupt throttling on the on-board 82574L NICs without seeing any > rx_missed_errors. Did it help? > > > > > Can you post the output of $ mpstat -P 1 ALL during peak load? > > > > We run "mpstat -P 5 ALL" continuously; is this sufficient resolution? I've > attached the mpstat output from the 09:30-10:30 yesterday, which is one of the > busiest hours of the day for multicast traffic. ~15'000 interrupts/core seems rather high to me - especially considering the fact that this is a 1GbE link. Reducing the InterruptThrottleRate to 1000/5000 while increasing the queue count (ethtool -G ... rx ...) should decrease it. > Also, here is the top of the output from powertop. Are you running with C-STATE > enabled? It is somewhat troubling that more than half of the time is spent in > the most power-saving state (C3), but I think this is averaged across all CPUs. I usually disable power management. Be advised, that we are using 10GbE cards and not 1GbE, so we are more vulnerable to scaling-the-core-down-right-when-the-cards-starts-flooding-the-hell-out-of-it... P.S. Please post your complete hardware configuration. (Board, CPU, in-which slot did you put the NIC, etc) - Gilboa -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines