[ TL;DR for the non-ARM folks on CC: disabling softirq processing when using SIMD in kernel mode could reduce complexity and improve performance, but we need to decide whether we can do this, and how much softirq processing latency we can tolerate. If we can find a satisfactory solution for this, we might do the same for x86 and 32-bit ARM as well. However, based on preliminary off-list discussions with peterz and luto, it seems that for x86, there is a preference for using per-CPU buffers to preserve/restore the task context's kernel mode SIMD state when the task is interrupted to perform kernel mode SIMD in softirq context. On arm64, we actually had this arrangement before, and removed it because it made reasoning about preserving/restoring userland SVE state (32 SIMD registers of up to 2 kbit in size) rather complex. ] The crypto API provides two ways to invoke symmetric encryption algorithms: - synchronously, where the transformation is guaranteed to be done by the time the function returns; - asynchronously, where the function may return with a -EINPROGRESS return code, and a completion will be signalled when the transformation is done. The latter is mainly intended for h/w accelerators, where the throughput would be severely limited by the latency otherwise. However, it is also being used for software algorithms based on SIMD instructions, which cannot be issued from any context (the rules are not the same on each architecture, but typically, SIMD can be used in task context, or in softirq context if it was not taken while the SIMD was already in use in kernel mode). Many users of the crypto API exist in the kernel today that opt out of this asynchronous interface (802.11, macsec, kerberos, sw kTLS), or use a library interface which is fundamentally synchronous (wireguard). This means we end up using a degraded mode for the contended case (a scalar fallback) as well as the uncontended case (generic GCM/CCM/CTR chaining mode templates wrapped around the SIMD cipher as opposed to accelerated implementations of the full chaining modes in question). Note that scalar AES runs ~20x slower than the SIMD instruction based version. So let's address this for arm64, by reorganizing kernel mode SIMD support so that the SIMD unit can always be assumed to be available. This means we need to defer softirq processing when grabbing the NEON unit in task context, so that any use of it in softirq context is guaranteed not to interrupt any code that was already using the NEON. This obviously impacts softirq processing latency, which is why the existing conditional yield support is modified to take pending softirqs into account. Change since RFC/v1: - add patch to remove obsolete cond_yield_neon macros - rebased onto new, simplified cond_yield macro - include patches to remove the async path from all arm64 crypto skciphers and AEADs Previous RFC version: [0] https://lore.kernel.org/linux-arm-kernel/20201218170106.23280-1-ardb@xxxxxxxxxx/ The first 3 patches will need to go through the arm64 tree, so once this series is reviewed, some coordination is required between the arm64 and crypto trees to get this merged without conflicts. Cc: Dave Martin <dave.martin@xxxxxxx> Cc: Mark Brown <broonie@xxxxxxxxxx> Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Cc: Eric Biggers <ebiggers@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxx> Ard Biesheuvel (9): arm64: assembler: remove conditional NEON yield macros arm64: assembler: introduce wxN aliases for wN registers arm64: fpsimd: run kernel mode NEON with softirqs disabled crypto: aead - disallow en/decrypt for non-task or non-softirq context crypto: skcipher - disallow en/decrypt for non-task or non-softirq context crypto: arm64/gcm-aes-ce - remove non-SIMD fallback path crypto: arm64/aes-ccm - remove non-SIMD fallback path crypto: arm64/aes-ce - stop using SIMD helper for skciphers crypto: arm64/aes-neonbs - stop using SIMD helper for skciphers arch/arm64/crypto/Kconfig | 3 - arch/arm64/crypto/aes-ce-ccm-glue.c | 151 +++----------- arch/arm64/crypto/aes-glue.c | 102 ++-------- arch/arm64/crypto/aes-modes.S | 2 +- arch/arm64/crypto/aes-neonbs-glue.c | 122 +----------- arch/arm64/crypto/ghash-ce-glue.c | 209 +++++--------------- arch/arm64/crypto/sha1-ce-core.S | 2 +- arch/arm64/crypto/sha2-ce-core.S | 2 +- arch/arm64/crypto/sha3-ce-core.S | 4 +- arch/arm64/crypto/sha512-ce-core.S | 2 +- arch/arm64/include/asm/assembler.h | 106 +++------- arch/arm64/kernel/asm-offsets.c | 2 + arch/arm64/kernel/fpsimd.c | 4 +- crypto/aead.c | 10 + crypto/skcipher.c | 10 + 15 files changed, 162 insertions(+), 569 deletions(-) -- 2.30.1