ARM router NAT performance affected by random/unrelated commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I work on home routers based on Broadcom's Northstar SoCs. Those devices
have ARM Cortex-A9 and most of them are dual-core.

As for home routers, my main concern is network performance. That CPU
isn't powerful enough to handle gigabit traffic so all kind of
optimizations do matter. I noticed some unexpected changes in NAT
performance when switching between kernels.

My hardware is BCM47094 SoC (dual core ARM) with integrated network
controller and external BCM53012 switch.

Relevant setup:
* SoC network controller is wired to the hardware switch
* Switch passes 802.1q frames with VID 1 to four LAN ports
* Switch passes 802.1q frames with VID 2 to WAN port
* Linux does NAT for LAN (eth0.1) to WAN (eth0.2)
* Linux uses pfifo and "echo 2 > rps_cpus"
* Ryzen 5 PRO 2500U (x86_64) laptop connected to a LAN port
* Intel i7-2670QM laptop connected to a WAN port

*****

I found a very nice example of commit that does /nothing/ yet it affects
NAT performance: 9316a9ed6895 ("blk-mq: provide helper for setting up an
SQ queue and tag set")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9316a9ed6895c4ad2f0cde171d486f80c55d8283
All it does is exporting an unused symbol (function).

Let me share some numbers (I use iperf for testing):

git reset --hard v4.19
git am OpenWrt-mtd-chages.patch
[  3]  0.0-30.0 sec  2.60 GBytes   745 Mbits/sec
[  3]  0.0-30.0 sec  2.60 GBytes   745 Mbits/sec
[  3]  0.0-30.0 sec  2.60 GBytes   744 Mbits/sec
[  3]  0.0-30.0 sec  2.59 GBytes   742 Mbits/sec
[  3]  0.0-30.0 sec  2.59 GBytes   740 Mbits/sec
[  3]  0.0-30.0 sec  2.59 GBytes   740 Mbits/sec
[  3]  0.0-30.0 sec  2.58 GBytes   738 Mbits/sec
[  3]  0.0-30.0 sec  2.58 GBytes   738 Mbits/sec
[  3]  0.0-30.0 sec  2.58 GBytes   738 Mbits/sec
[  3]  0.0-30.0 sec  2.57 GBytes   735 Mbits/sec
Average: 741 Mb/s

git reset --hard v4.19
git am OpenWrt-mtd-chages.patch
git cherry-pick -x 9316a9ed6895
[  3]  0.0-30.0 sec  2.73 GBytes   780 Mbits/sec
[  3]  0.0-30.0 sec  2.72 GBytes   777 Mbits/sec
[  3]  0.0-30.0 sec  2.71 GBytes   775 Mbits/sec
[  3]  0.0-30.0 sec  2.70 GBytes   773 Mbits/sec
[  3]  0.0-30.0 sec  2.70 GBytes   771 Mbits/sec
[  3]  0.0-30.0 sec  2.69 GBytes   771 Mbits/sec
[  3]  0.0-30.0 sec  2.69 GBytes   771 Mbits/sec
[  3]  0.0-30.0 sec  2.69 GBytes   770 Mbits/sec
[  3]  0.0-30.0 sec  2.69 GBytes   769 Mbits/sec
[  3]  0.0-30.0 sec  2.68 GBytes   768 Mbits/sec
Average: 773 Mb/s

As you can see cherry-picking (on top of Linux 4.19) a single commit
that does /nothing/ can improve NAT performance by 4,5%.

*****

I was hoping to learn something from profiling kernel with the "perf"
tool. Eanbling CONFIG_PERF_EVENTS resulted in smaller NAT performance
gain: 741 Mb/s → 750 Mb/s. I tried it anyway.

Without cherry-picking I got:
+    9,04%  swapper          [kernel.kallsyms]  [k] v7_dma_inv_range
+    5,54%  swapper          [kernel.kallsyms]  [k] __irqentry_text_end
+    5,12%  swapper          [kernel.kallsyms]  [k] l2c210_inv_range
+    4,30%  ksoftirqd/1      [kernel.kallsyms]  [k] v7_dma_clean_range
+    4,02%  swapper          [kernel.kallsyms]  [k] bcma_host_soc_read32
+    3,13%  swapper          [kernel.kallsyms]  [k] arch_cpu_idle
+    2,88%  ksoftirqd/1      [kernel.kallsyms]  [k] __netif_receive_skb_core
+    2,51%  ksoftirqd/1      [kernel.kallsyms]  [k] l2c210_clean_range
+    1,88%  ksoftirqd/1      [kernel.kallsyms]  [k] fib_table_lookup
(741 Mb/s while *not* running perf)

With cherry-picked 9316a9ed6895 I got:
+    9,16%  swapper          [kernel.kallsyms]  [k] v7_dma_inv_range
+    5,64%  swapper          [kernel.kallsyms]  [k] __irqentry_text_end
+    5,05%  swapper          [kernel.kallsyms]  [k] l2c210_inv_range
+    4,25%  ksoftirqd/1      [kernel.kallsyms]  [k] v7_dma_clean_range
+    4,10%  swapper          [kernel.kallsyms]  [k] bcma_host_soc_read32
+    3,35%  ksoftirqd/1      [kernel.kallsyms]  [k] __netif_receive_skb_core
+    3,17%  swapper          [kernel.kallsyms]  [k] arch_cpu_idle
+    2,49%  ksoftirqd/1      [kernel.kallsyms]  [k] l2c210_clean_range
+    2,03%  ksoftirqd/1      [kernel.kallsyms]  [k] fib_table_lookup
(750 Mb/s while *not* running perf)

Changes seem quite minimal and I'm not sure if they tell what is causing
that NAT performance change at all.

*****

I also tried running cachestat but didn't get anything interesting:
Counting cache functions... Output every 1 seconds.
TIME         HITS   MISSES  DIRTIES    RATIO   BUFFERS_MB   CACHE_MB
10:06:59     1020        5        0    99.5%            0          2
10:07:00     1029        0        0   100.0%            0          2
10:07:01     1013        0        0   100.0%            0          2
10:07:02     1029        0        0   100.0%            0          2
10:07:03     1029        0        0   100.0%            0          2
10:07:04      997        0        0   100.0%            0          2
10:07:05     1013        0        0   100.0%            0          2
(I started iperf at 10:07:00).

*****

There were more situations with such unexpected performance changes.
Another example: cherry-picking 5b0890a97204 ("flow_dissector: Parse
batman-adv unicast headers")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b0890a97204627d75a333fc30f29f737e2bfad6
to some Linux 4.14.x release was lowering NAT performance by 55 Mb/s.

The tricky part is there aren't any ETH_P_BATMAN packets in my traffic.
Extra tests revealed that any __skb_flow_dissect() modification was
lowering my NAT performance (e.g. commenting out ETH_P_TIPC or
ETH_P_FCOE switch cases).

*****

I would like every kernel to provide a maximum NAT performance, no
matter what random commits it contains.

Suffering from such a random changes makes it also really hard to notice
a real performance regression.

Do you have any idea what is causing those performance changes? Can I
provide any extra info to help debugging this?
047-v4.21-mtd-keep-original-flags-for-every-struct-mtd_info.patch
048-v4.21-mtd-improve-calculating-partition-boundaries-when-ch.patch
080-v5.1-0001-bcma-keep-a-direct-pointer-to-the-struct-device.patch
080-v5.1-0002-bcma-use-dev_-printing-functions.patch
095-Allow-class-e-address-assignment-via-ifconfig-ioctl.patch

140-jffs2-use-.rename2-and-add-RENAME_WHITEOUT-support.patch
141-jffs2-add-RENAME_EXCHANGE-support.patch
400-mtd-add-rootfs-split-support.patch
401-mtd-add-support-for-different-partition-parser-types.patch
402-mtd-use-typed-mtd-parsers-for-rootfs-and-firmware-split.patch
403-mtd-hook-mtdsplit-to-Kbuild.patch
404-mtd-add-more-helper-functions.patch
431-mtd-bcm47xxpart-check-for-bad-blocks-when-calculatin.patch
432-mtd-bcm47xxpart-detect-T_Meter-partition.patch
480-mtd-set-rootfs-to-be-root-dev.patch
490-ubi-auto-attach-mtd-device-named-ubi-or-data-on-boot.patch
491-ubi-auto-create-ubiblock-device-for-rootfs.patch
492-try-auto-mounting-ubi0-rootfs-in-init-do_mounts.c.patch
493-ubi-set-ROOT_DEV-to-ubiblock-rootfs-if-unset.patch
530-jffs2_make_lzma_available.patch
532-jffs2_eofdetect.patch
500-v4.20-ubifs-Fix-default-compression-selection-in-ubifs.patch
553-ubifs-Add-option-to-create-UBI-FS-version-4-on-empty.patch

700-swconfig_switch_drivers.patch
702-phy_add_aneg_done_function.patch
721-phy_packets.patch
773-bgmac-add-srab-switch.patch
910-kobject_uevent.patch
911-kobject_add_broadcast_uevent.patch

[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux