IRQ affinity for network IRQs on x86-64 (and IA64) SMP platforms

John Lumby <johnlumby@xxxxxxxxxxx> · Wed, 17 Sep 2008 22:53:14 -0400

I am interested in investigating being able to distribute softirq work from a network NIC across multiple processor cores on an x86-64 machine  -
this particular one has two Dual Core AMD Opteron Processor 275 and two broadcom gigabit NICs .   But in general, where the number of cores is a multiple of the number of NICs, I'd like to be able to distribute the IRQs of each NIC over that multiple of cores.

The background is that I am running a network-intensive bidirectional workload on two of these machines, using a single bonded IP interface on each machine interconnected by a switch,  each bond consisting of the two gigabit interfaces running full-duplex, with multiple sessions each establishing connections between these two IP endpoints;
and I am seeing that :
    .   total network throughput of around 2660 Megabits/sec through each bond
        (aggregated over send and receive)
        is rather less than the network is capable of  (CPU power permitting, the network is capable of somewhere nearer 3950 Megabits/sec) 
    .  overall CPU uitilization is only around 85%, so some to spare ...
    .  ... but /proc/stat shows that the CPU utilization is very uneven over the 4 cores,
       with all the softirq processing confined to two cores.

I believe that for this workload, the network throughput would increase to around 3000 Megabits/sec if the softirq load could be spread evenly over all 4 cores.

I switched off the irqbalance daemon and then tried altering the /proc/irq//smp_affinity proc files myself manually for the two IRQs (one for each NIC)   to specify, for each one, two cores
e.g. 05 for irq 225 and 0a for irq 201,        At the time, the machine was running a 2.6.16 kernel.   The result was - no dsitribution at alll.   That is, for each NIC, as reported in /proc/interrupts,  all interrupts were being directed to a single core  - which was the "first" (in little-endian sense) of the bits in my smp_affinity mask.  It ignored the second.

I then came across the paper in /Documentation/ia64/IRQ-redir.txt that documents this behaviour for ia64  (but I don't see anything saying this is also the
case on x86-64).    The paper says

   "Because of the usage of SAPIC mode and physical destination mode the IRQ target is one particular CPU and cannot be a mask of several CPUs. Only the first non-zero bit is taken into account."

Ok  -  so that is exactly what saw (on 2.6.16)

Here is a clip of /proc/interrupts showing my two NICs after a run on 2.6.16
            CPU0         CPU1         CPU2         CPU3
217:    2828591     551570   14406281    2734679   IO-APIC-level  eth5
225:   18986626          0    2643626         14   IO-APIC-level  eth3

(Note -   I know the ratios are not all:0  -  I had been experimenting with different masks  -  and don't see any way of resetting counters)

I then upgraded the kernel to 2.6.26.5 and tried again, and now I see something different.   with the same masks (05, 0a)  I see that, for each NIC,  IRQs are now distributed over the two cores I specify in the mask  -  but not evenly.   The ratio is around 7:1.       This is better than all:0 and raises the throughput from 2660 Mbits/sec  to  over 2810 Mbits/sec with no other changes.  

Here is a clip of /proc/interrupts showing my two NICs after a run on 2.6.26
            CPU0         CPU1         CPU2         CPU3
24:   144       1145810          0     612858   IO-APIC-fasteoi   eth5
25: 83517             7     575415     849336   IO-APIC-fasteoi   eth3

again, the ratios are from several runs with different masks, but the counts for CPUs 0 and 2 for IRQ 25 are representative.

A couple of obvious changes from 2.6.16  -
IRQ numbers are smaller
IRQ method has changed from  IO-APIC-level  to   IO-APIC-fasteoi 

I see better CPU utilization over the 4 cores from /proc/stat, in particular, softirq work spread in that 7:1 ratio.      So it seems that smp_affinity does partially work for a network device and several cores.      So I am happier but left with a number of questions and hoping someone can answer  :

1)     As far as I can tell, SAPIC, aka IOSAPIC, is specific to Itanium, but in the literature I see something which appears to be similar called X2APIC on other Intel 64-bit architectures.     Does X2APIC have the same behaviour as regards IRQ balancing and smp affinity?     And does the AMD Opteron(tm) Processor 275 also use X2APIC or AMD equivalent?

2)     Is it expected that something changed in this area between 2.6.16 and 2.6.26 and if so what?   ((maybe related to the external changes in output of /proc/interrupts I noted?)

3)     Is it now possible, on this current kernel and with my hardware (or any gigabit NIC) to distribute softirq work approx 50:50 over two cores?    If so, how?

I can supply more information about the runs and config etc if needed

John
_________________________________________________________________

--
To unsubscribe from this list: send the line "unsubscribe linux-smp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html