Semi-OT: Profiling 10GbE devices... Help?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

I'm almost certain that is the not the right place to ask this question,
but if RedHat/Fedora's kernel engineers can't help me, I'm truly
screwed.

I'm are using two Intel 10GbE (ixgbe) cards to passively monitor 10GbE
lines (Under RHEL 5.2) either using the in-kernel dev_add_pack interface
(built-in ixgbe driver) or using a slightly modified ixgbe driver.
(built around Intel's latest ixgbe driver)

However, I'm experiencing odd performance issues - namely, once I
configure the driver to use MSI-X w/ multi-queue [MQ] (forcing pci=msi)
and assign each IRQ to one CPU core (irq cpu affinity), my software
requires -10x- more CPU cycles (measured using rdtsc; compared to
multiple GbE links and/or w/ MSI-X/MQ disabled) to process each packet,
causing massive missed IRQs (rx_missed_errors) induced packet loss.
Looking at mpstat I can see the each CPU core is handling a fairly low
number of interrupts (200-1000) while spending most of its time in
softIRQ. (>90%, most likely within my own code)

I decided to check newer kernels so I've installed F10 (24C Xeon-MP
Intel S7000FC4U) and F9 (16C Opteron DL585G5, *) on two machines, but
even with 2.6.27 kernels and I'm experiencing the same performance
issues.
Given the fact that the same code is used to process packets - no matter
what type of links are being used, my first instinct was to look at the
CPU cores themselves. (E.g. L1 & L2 dcache miss rates; TLB flushes;
etc).

I tried using oprofile, but I failed to make it work. 
On one machine (Xeon-MP, F10), oprofile failed to identify the
Dunnington CPU (switching to timer mode) and on the other (Barcelona
8354, F9), even though it was configured to report dcache statistics
[1,2] opreport returns empty reports.
In-order to verify that oprofile indeed works on Opteron machine, I
reconfigured oprofile to report CPU usage [3], but even than, oprofile
either returns empty results to hard-locks the machine.

So:
A. Anyone else seeing the same odd behavior once MSI-X/MQ is enabled on
Intel's 10G cards? (P.S. MQ cannot be enabled on both machines unless I
add pci=msi to the kernel's command line)
B. Any idea why oprofile refuses to generate cache statistics and/or
what did I do wrong?
C. Before I dive into AMD's and Intel's MSR/PMC documentation and spend
the next five days trying to decipher which architectural /
non-architectural counter needs to set/used and how, do you have any
idea how I can access the performance counters without writing the code
myself?

- Gilboa
[1] opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/2.6.27.9-73.fc9.x86_64/vmlinux --event=DATA_CACHE_ACCESS:1000:0:1:1
[2] opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/2.6.27.9-73.fc9.x86_64/vmlinux --event=L2_CACHE_MISS:1000:0:1:1
[3] opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/2.6.27.9-73.fc9.x86_64/vmlinux --event=CPU_CLK_UNHALTED:10000000:0:1:1
* F10 seems to dislike the DL585G5; Issue already reported against anaconda. (#480638)

_______________________________________________
Fedora-kernel-list mailing list
Fedora-kernel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-kernel-list

[Index of Archives]     [Fedora General Discussion]     [Older Fedora Users Archive]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Announce]     [Fedora Package Review]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Coolkey]     [Yum Users]     [Tux]     [Yosemite News]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [USB]     [Asterisk PBX]

  Powered by Linux