On Tue, Mar 09, 2021 at 11:24:55AM +0100, Pali Rohár wrote: > On Monday 08 March 2021 20:55:20 Greg KH wrote: > > On Mon, Mar 08, 2021 at 06:57:57PM +0100, Pali Rohár wrote: > > > From: Alexander Lobakin <bloodyreaper@xxxxxxxxx> > > > > > > commit e131a5634830047923c694b4ce0c3b31745ff01b upstream. > > > > > > gro_cells lib is used by different encapsulating netdevices, such as > > > geneve, macsec, vxlan etc. to speed up decapsulated traffic processing. > > > CPU tag is a sort of "encapsulation", and we can use the same mechs to > > > greatly improve overall DSA performance. > > > skbs are passed to the GRO layer after removing CPU tags, so we don't > > > need any new packet offload types as it was firstly proposed by me in > > > the first GRO-over-DSA variant [1]. > > > > > > The size of struct gro_cells is sizeof(void *), so hot struct > > > dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields > > > remain in one 32-byte cacheline. > > > The other positive side effect is that drivers for network devices > > > that can be shipped as CPU ports of DSA-driven switches can now use > > > napi_gro_frags() to pass skbs to kernel. Packets built that way are > > > completely non-linear and are likely being dropped without GRO. > > > > > > This was tested on to-be-mainlined-soon Ethernet driver that uses > > > napi_gro_frags(), and the overall performance was on par with the > > > variant from [1], sometimes even better due to minimal overhead. > > > net.core.gro_normal_batch tuning may help to push it to the limit > > > on particular setups and platforms. > > > > > > iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup > > > on 1.2 GHz MIPS board: > > > > > > 5.7-rc2 baseline: > > > > > > [ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender > > > [ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver > > > > > > Iface RX packets TX packets > > > eth0 7097731 7097702 > > > port0 426050 6671829 > > > port1 6671681 425862 > > > port1.218 6671677 425851 > > > > > > With this patch: > > > > > > [ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender > > > [ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver > > > > > > Iface RX packets TX packets > > > eth0 9474792 9474777 > > > port0 455200 353288 > > > port1 9019592 455035 > > > port1.218 353144 455024 > > > > > > v2: > > > - Add some performance examples in the commit message; > > > - No functional changes. > > > > > > [1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@xxxxxxxx/ > > > > > > Signed-off-by: Alexander Lobakin <bloodyreaper@xxxxxxxxx> > > > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > > > > > > --- > > > This patch radically increase network performance on DSA setup. > > > > > > Please include this patch into stable releases. > > > > > > I have done following tests: > > > > > > NAT is a tested Espressobin board (ARM64 Marvell Armada 3720 SoC with > > > Marvell 88E6141 DSA switch) which was configured for IPv4 masquerade. > > > WAN and LAN are another two static boxes on which was running iperf3. > > > > > > 4.19.179 without e131a5634830047923c694b4ce0c3b31745ff01b > > > > > > WAN --> NAT --> LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.01 sec 440 MBytes 369 Mbits/sec 12 sender > > > [ 5] 0.00-10.00 sec 437 MBytes 367 Mbits/sec receiver > > > > > > WAN <-- NAT <-- LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 390 MBytes 327 Mbits/sec 90 sender > > > [ 5] 0.00-10.01 sec 388 MBytes 326 Mbits/sec receiver > > > > > > 4.19.179 with e131a5634830047923c694b4ce0c3b31745ff01b > > > > > > WAN --> NAT --> LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 18 sender > > > [ 5] 0.00-10.00 sec 613 MBytes 515 Mbits/sec receiver > > > > > > WAN <-- NAT <-- LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 573 MBytes 480 Mbits/sec 32 sender > > > [ 5] 0.00-10.01 sec 570 MBytes 478 Mbits/sec receiver > > > > > > 5.4.103 without e131a5634830047923c694b4ce0c3b31745ff01b > > > > > > WAN --> NAT --> LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.01 sec 454 MBytes 380 Mbits/sec 62 sender > > > [ 5] 0.00-10.00 sec 451 MBytes 378 Mbits/sec receiver > > > > > > WAN <-- NAT <-- LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 425 MBytes 356 Mbits/sec 155 sender > > > [ 5] 0.00-10.01 sec 422 MBytes 354 Mbits/sec receiver > > > > > > 5.4.103 with e131a5634830047923c694b4ce0c3b31745ff01b > > > > > > WAN --> NAT --> LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.01 sec 604 MBytes 506 Mbits/sec 8 sender > > > [ 5] 0.00-10.00 sec 601 MBytes 504 Mbits/sec receiver > > > > > > WAN <-- NAT <-- LAN > > > [ ID] Interval Transfer Bitrate Retr > > > [ 5] 0.00-10.00 sec 578 MBytes 485 Mbits/sec 79 sender > > > [ 5] 0.00-10.01 sec 575 MBytes 482 Mbits/sec receiver > > > --- > > > net/dsa/Kconfig | 1 + > > > net/dsa/dsa.c | 2 +- > > > net/dsa/dsa_priv.h | 3 +++ > > > net/dsa/slave.c | 10 +++++++++- > > > 4 files changed, 14 insertions(+), 2 deletions(-) > > > > So this patch should be applied to the 4.19 and 5.4 stable queues? > > Yes! Patch was introduced in 5.8 and applies cleanly for 4.19 and 5.4 > stable releases without any modifications. Trying to apply it for 4.14 > results in patch conflicts. So I have done tests only for 4.19 and 5.4. Great, now queued up, thanks. greg k-h