Re: Non-consistent CPU usage in IP forwarding test

Abu Rasheda <rcpilot2010@xxxxxxxxx> · Thu, 3 Apr 2014 21:43:00 -0700

On Thursday, April 3, 2014, Oleg A. Arkhangelsky <sysoleg@xxxxxxxxx> wrote:

Hello all,

We've got very strange behavior when testing IP packet forwarding performance

on Sandy Bridge platform (Supermicro X9DRH with the latest BIOS). This is two

socket E5-2690 CPU system. Using different PC we're generating DDoS-like traffic

with rate of about 4.5 million packets per second. Traffic is receiving by two

Intel 82599 NICs and forwarding using the second port of one of this NICs. All

load is evenly distributed among two nodes, so each of 32 CPUs SI usage is

virtually equal.

Now the strangest part. Few moments after pktgen start on traffic generator PC,

average CPU usage on SB system goes to 30-35%. No packet drops,

no rx_missed_errors, no rx_no_dma_resources. Very nice. But SI usage starts to

decreasing gradually. After about 10 seconds we see ~15% SI average among all

CPUs. Still no packet drops, the same RX rate as in the beginning, RX packet

count is equal to TX packet count. After some time we see that average SI usage

start to go up. Peaked at initial 30-35% it goes down to 15% again. This pattern

is repeated every 80 seconds. Interval is very stable. It is undoubtedly bind

to the test start time, because if we start test, then interrupt it after 10

seconds and start it again we see the same 30% SI peak in a few moments. Then

all timings will be the same.

During the high load time we see this in "perf top -e cache-misses":

            14017.00 24.9% __netdev_alloc_skb           [kernel.kallsyms]

             5172.00  9.2% _raw_spin_lock               [kernel.kallsyms]

             4722.00  8.4% build_skb                    [kernel.kallsyms]

             3603.00  6.4% fib_table_lookup             [kernel.kallsyms]

During the "15% load time" top is different:

            11090.00 20.9% build_skb                [kernel.kallsyms]

             4879.00  9.2% fib_table_lookup         [kernel.kallsyms]

             4756.00  9.0% ipt_do_table             /lib/modules/3.12.15-BUILD-g2e94e30-dirty/kernel/net/ipv4/netfilter/ip_tables.ko

             3042.00  5.7% nf_iterate               [kernel.kallsyms]

And __netdev_alloc_skb is at the end of list:

              911.00  0.5% __netdev_alloc_skb             [kernel.kallsyms]

Some info from "perf stat -a sleep 2":

15% SI:

       28640006291 cycles                    #    0.447 GHz                     [83.23%]

       38764605205 instructions              #    1.35  insns per cycle

30% SI:

       56225552442 cycles                    #    0.877 GHz                     [83.23%]

       39718182298 instructions              #    0.71  insns per cycle

CPUs never go above C1 state, all cores speed from /proc/cpuinfo is constant at

2899.942 MHz. ASPM is disabled.

All non-essential userspace apps was explicitly killed for test time, there

was no active cron jobs too. So we should assume no interference with

userspace.

Kernel version is 3.12.15 (ixgbe 3.21.2), but we have the same behavior with

ancient 2.6.35 (ixgbe 3.10.16). Although on 2.6.35 we sometimes get 160-170

seconds interval and different symbols at the "perf top" output (especially

local_bh_enable() which is completely blows my mind).

Does anybody have some thoughts about the reasons of this kind of behavior?

Sandy Bridge CPU has many uncore/offcore events, which I can sample, maybe

some of them can shed some light on such behavior?

Is it NUMA system ? This happens when node tries to access memory connected to other CPU. 

Abu Raheda 
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies