I am interested in investigating being able to distribute softirq work from a network NIC across multiple processor cores on an x86-64 machine - this particular one has two Dual Core AMD Opteron Processor 275 and two broadcom gigabit NICs . But in general, where the number of cores is a multiple of the number of NICs, I'd like to be able to distribute the IRQs of each NIC over that multiple of cores. The background is that I am running a network-intensive bidirectional workload on two of these machines, using a single bonded IP interface on each machine interconnected by a switch, each bond consisting of the two gigabit interfaces running full-duplex, with multiple sessions each establishing connections between these two IP endpoints; and I am seeing that : . total network throughput of around 2660 Megabits/sec through each bond (aggregated over send and receive) is rather less than the network is capable of (CPU power permitting, the network is capable of somewhere nearer 3950 Megabits/sec) . overall CPU uitilization is only around 85%, so some to spare ... . ... but /proc/stat shows that the CPU utilization is very uneven over the 4 cores, with all the softirq processing confined to two cores. I believe that for this workload, the network throughput would increase to around 3000 Megabits/sec if the softirq load could be spread evenly over all 4 cores. I switched off the irqbalance daemon and then tried altering the /proc/irq//smp_affinity proc files myself manually for the two IRQs (one for each NIC) to specify, for each one, two cores e.g. 05 for irq 225 and 0a for irq 201, At the time, the machine was running a 2.6.16 kernel. The result was - no dsitribution at alll. That is, for each NIC, as reported in /proc/interrupts, all interrupts were being directed to a single core - which was the "first" (in little-endian sense) of the bits in my smp_affinity mask. It ignored the second. I then came across the paper in /Documentation/ia64/IRQ-redir.txt that documents this behaviour for ia64 (but I don't see anything saying this is also the case on x86-64). The paper says "Because of the usage of SAPIC mode and physical destination mode the IRQ target is one particular CPU and cannot be a mask of several CPUs. Only the first non-zero bit is taken into account." Ok - so that is exactly what saw (on 2.6.16) Here is a clip of /proc/interrupts showing my two NICs after a run on 2.6.16 CPU0 CPU1 CPU2 CPU3 217: 2828591 551570 14406281 2734679 IO-APIC-level eth5 225: 18986626 0 2643626 14 IO-APIC-level eth3 (Note - I know the ratios are not all:0 - I had been experimenting with different masks - and don't see any way of resetting counters) I then upgraded the kernel to 2.6.26.5 and tried again, and now I see something different. with the same masks (05, 0a) I see that, for each NIC, IRQs are now distributed over the two cores I specify in the mask - but not evenly. The ratio is around 7:1. This is better than all:0 and raises the throughput from 2660 Mbits/sec to over 2810 Mbits/sec with no other changes. Here is a clip of /proc/interrupts showing my two NICs after a run on 2.6.26 CPU0 CPU1 CPU2 CPU3 24: 144 1145810 0 612858 IO-APIC-fasteoi eth5 25: 83517 7 575415 849336 IO-APIC-fasteoi eth3 again, the ratios are from several runs with different masks, but the counts for CPUs 0 and 2 for IRQ 25 are representative. A couple of obvious changes from 2.6.16 - IRQ numbers are smaller IRQ method has changed from IO-APIC-level to IO-APIC-fasteoi I see better CPU utilization over the 4 cores from /proc/stat, in particular, softirq work spread in that 7:1 ratio. So it seems that smp_affinity does partially work for a network device and several cores. So I am happier but left with a number of questions and hoping someone can answer : 1) As far as I can tell, SAPIC, aka IOSAPIC, is specific to Itanium, but in the literature I see something which appears to be similar called X2APIC on other Intel 64-bit architectures. Does X2APIC have the same behaviour as regards IRQ balancing and smp affinity? And does the AMD Opteron(tm) Processor 275 also use X2APIC or AMD equivalent? 2) Is it expected that something changed in this area between 2.6.16 and 2.6.26 and if so what? ((maybe related to the external changes in output of /proc/interrupts I noted?) 3) Is it now possible, on this current kernel and with my hardware (or any gigabit NIC) to distribute softirq work approx 50:50 over two cores? If so, how? I can supply more information about the runs and config etc if needed John _________________________________________________________________ -- To unsubscribe from this list: send the line "unsubscribe linux-smp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html