On Thu, 15 Dec 2022 at 11:51, Russell King (Oracle) <linux@xxxxxxxxxxxxxxx> wrote: > > On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote: > > On Thu, 15 Dec 2022 at 11:27, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > > > > > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > > > > We currently only permit kernel mode NEON in process context, to avoid > > > > the need to preserve/restore the NEON register file when taking an > > > > exception while running in the kernel. > > > > > > > > Like we did on arm64, we can relax this restriction substantially, by > > > > permitting kernel mode NEON from softirq context, while ensuring that > > > > softirq processing is disabled when the NEON is being used in task > > > > context. This guarantees that only NEON context belonging to user space > > > > needs to be preserved and restored, which is already taken care of. > > > > > > > > This is especially relevant for network encryption, where incoming > > > > frames are typically handled in softirq context, and deferring software > > > > decryption to a kernel thread or falling back to C code are both > > > > undesirable from a performance PoV. > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > > > > > So boosting WireGuard as primary SW network encryption user? > > > > Essentially, although the use case that inspired this work is related > > to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is > > ~3x faster than WG's chacha20poly1305, which makes the performance > > overhead of asynchronous completion even more significant. (Note that > > GCM needs the AES and PMULL instructions which are usually only > > available when running the 32-bit kernel on a 64-bit core, whereas > > chacha20poly1305 uses ordinary NEON instructions.) > > > > But Martin responded with a Tested-by regarding chacha20poly1305 on > > IPsec (not WG) where there is also a noticeable speedup, so WG on > > ARM32 should definitely benefit from this as well. > > It'll be interesting to see whether there is any noticable difference > with my WG VPN. > Using WireGuard with the same 32-bit KVM guest communicating with its 64-bit host using virtio-net, I get a 44% speedup in the host->guest direction. The other direction performs exactly the same, which is unsurprising as it doesn't involve NEON crypto in softirq context at all. BEFORE ====== ardb@vm32:~$ iperf3 -c 192.168.11.2 Connecting to host 192.168.11.2, port 5201 [ 5] local 192.168.11.1 port 40144 connected to 192.168.11.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 25.8 MBytes 216 Mbits/sec 0 397 KBytes [ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 397 KBytes [ 5] 2.00-3.00 sec 27.0 MBytes 226 Mbits/sec 0 397 KBytes [ 5] 3.00-4.00 sec 26.5 MBytes 222 Mbits/sec 0 397 KBytes [ 5] 4.00-5.00 sec 26.2 MBytes 220 Mbits/sec 0 397 KBytes [ 5] 5.00-6.00 sec 26.1 MBytes 219 Mbits/sec 0 436 KBytes [ 5] 6.00-7.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes [ 5] 7.00-8.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes [ 5] 8.00-9.00 sec 26.5 MBytes 222 Mbits/sec 0 480 KBytes [ 5] 9.00-10.00 sec 26.9 MBytes 225 Mbits/sec 0 480 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 263 MBytes 221 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec receiver ardb@sudo:~$ iperf3 -c 192.168.11.1 Connecting to host 192.168.11.1, port 5201 [ 5] local 192.168.11.2 port 46340 connected to 192.168.11.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 47.5 MBytes 398 Mbits/sec 0 1.75 MBytes [ 5] 1.00-2.00 sec 45.0 MBytes 377 Mbits/sec 18 1.35 MBytes [ 5] 2.00-3.00 sec 43.8 MBytes 367 Mbits/sec 0 1.47 MBytes [ 5] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec 0 1.56 MBytes [ 5] 4.00-5.00 sec 45.0 MBytes 377 Mbits/sec 0 1.63 MBytes [ 5] 5.00-6.00 sec 42.5 MBytes 357 Mbits/sec 0 1.68 MBytes [ 5] 6.00-7.00 sec 43.8 MBytes 367 Mbits/sec 0 1.71 MBytes [ 5] 7.00-8.00 sec 43.8 MBytes 367 Mbits/sec 0 1.73 MBytes [ 5] 8.00-9.00 sec 45.0 MBytes 377 Mbits/sec 0 1.74 MBytes [ 5] 9.00-10.00 sec 43.8 MBytes 367 Mbits/sec 0 1.75 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 445 MBytes 373 Mbits/sec 18 sender [ 5] 0.00-10.04 sec 444 MBytes 371 Mbits/sec receiver iperf Done. AFTER ===== ardb@vm32:~$ iperf3 -c 192.168.11.2 Connecting to host 192.168.11.2, port 5201 [ 5] local 192.168.11.1 port 44004 connected to 192.168.11.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 26.2 MBytes 220 Mbits/sec 0 399 KBytes [ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 399 KBytes [ 5] 2.00-3.00 sec 26.0 MBytes 218 Mbits/sec 0 444 KBytes [ 5] 3.00-4.00 sec 26.8 MBytes 225 Mbits/sec 0 485 KBytes [ 5] 4.00-5.00 sec 26.4 MBytes 222 Mbits/sec 0 542 KBytes [ 5] 5.00-6.00 sec 26.6 MBytes 223 Mbits/sec 0 568 KBytes [ 5] 6.00-7.00 sec 25.4 MBytes 213 Mbits/sec 0 568 KBytes [ 5] 7.00-8.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes [ 5] 8.00-9.00 sec 26.7 MBytes 224 Mbits/sec 0 568 KBytes [ 5] 9.00-10.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec 0 sender [ 5] 0.00-9.99 sec 261 MBytes 219 Mbits/sec receiver iperf Done. ardb@sudo:~$ iperf3 -c 192.168.11.1 Connecting to host 192.168.11.1, port 5201 [ 5] local 192.168.11.2 port 49838 connected to 192.168.11.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 61.2 MBytes 514 Mbits/sec 0 1.59 MBytes [ 5] 1.00-2.00 sec 66.2 MBytes 555 Mbits/sec 0 1.67 MBytes [ 5] 2.00-3.00 sec 65.0 MBytes 545 Mbits/sec 79 1.24 MBytes [ 5] 3.00-4.00 sec 63.8 MBytes 535 Mbits/sec 0 1.36 MBytes [ 5] 4.00-5.00 sec 63.8 MBytes 535 Mbits/sec 0 1.46 MBytes [ 5] 5.00-6.00 sec 63.8 MBytes 535 Mbits/sec 0 1.53 MBytes [ 5] 6.00-7.00 sec 62.5 MBytes 524 Mbits/sec 0 1.59 MBytes [ 5] 7.00-8.00 sec 65.0 MBytes 545 Mbits/sec 99 1.18 MBytes [ 5] 8.00-9.00 sec 65.0 MBytes 545 Mbits/sec 0 1.25 MBytes [ 5] 9.00-10.00 sec 65.0 MBytes 545 Mbits/sec 0 1.30 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 641 MBytes 538 Mbits/sec 178 sender [ 5] 0.00-10.02 sec 638 MBytes 535 Mbits/sec receiver iperf Done.